Predict who is drunk using tweets
I have never blogged (to my knowledge) without something controversial before. This one is nothing like what i have written before.
For the last couple of weeks i have been playing with the idea of identifying if some on is drunk based on their tweets.
I started of with below thinking process.
Some one jumps on a site and says i want to monitor twitter id A.
We will gather tweets from that person every hour/time slice and basically analyze as below
1. Remove acronyms/#tags
2. Find number of spelling mistakes (using standard distance algorithms)
3. Find number of swear words (need to build some dictionary of swear words, not that hard :-))
4. Remember number of spelling mistakes and swear words/time slice.
5. Repeat steps 1 to 4 every hour/time slice
Say at about 10 PM in the night the person tracking twitter id A asks, is A drunk? Based on the trends in mistakes and swear words, predict if that person is drunk or not.
Sounds funny but could be useful. I even thought about revenue stream, connecting this to cab services who can push notifications to those twitter ids about being safe and not driving.
All for a social cause :-)
Addendum: Pitfalls of mobile keyboard dynamics that are hurdles for this project
- auto-correction
- typing while angry or tired
- typing while talking to someone
- last but not the least copy-paste