Friday, November 19, 2010

Predict who is drunk using tweets

I have never blogged (to my knowledge) without something controversial before. This one is nothing like what i have written before.

For the last couple of weeks i have been playing with the idea of identifying if some on is drunk based on their tweets.

I started of with below thinking process.

Some one jumps on a site and says i want to monitor twitter id A.
We will gather tweets from that person every hour/time slice and basically analyze as below

1. Remove acronyms/#tags
2. Find number of spelling mistakes (using standard distance algorithms)
3. Find number of swear words (need to build some dictionary of swear words, not that hard :-))
4. Remember number of spelling mistakes and swear words/time slice.
5. Repeat steps 1 to 4 every hour/time slice

Say at about 10 PM in the night the person tracking twitter id A asks, is A drunk? Based on the trends in mistakes and swear words, predict if that person is drunk or not.

Sounds funny but could be useful. I even thought about revenue stream, connecting this to cab services who can push notifications to those twitter ids about being safe and not driving.

All for a social cause :-)

Addendum: Pitfalls of mobile keyboard dynamics that are hurdles for this project
  • auto-correction
  • typing while angry or tired
  • typing while talking to someone
  • last but not the least copy-paste

K