Feb
22

You In Haiku: A Twitter Bot That Writes Haikus Paul

Other

Not too long ago I had an idea. I was scrolling through my Twitter feed and was struck by how poetic some of the tweets could be. Constrained by 144 characters, people were finding more concise and expressive ways of saying things. Packing a powerful message in to a minimalist medium, hummm, that sounds exactly like a Haiku! What if you could follow a bot that would tweet back to you any time you happened to write a tweet that was in the syllabic form of a haiku? Could tweets be tweaked in to the haiku format with a little coaxing? A few weekends later YouInHaiku, the haiku writing Twitter bot was born!

This post will be a high level write up of how YouInHaiku works. In my next post, I’ll discuss the inner workings of the bot, including my choice to build it in Scala, a fantastic functional-style programming language.

The first step was to be able to identify a haiku. Haikus are a form of Japanese poetry that consist of 17 syllables. They are three lines long, following the format of 5-7-5 syllables on each line.  I needed some way of identifying syllables with reasonable accuracy. I started looking for a set of rules that would determine syllable count, but my search turned up algorithms that worked 75% of the time at best. This meant an unacceptable rate of false positives and false negatives. Luckally, I stumbled on to a near-perfect solution. The Carnegie Mellon Pronouncing Dictionary is a dictionary used primarily in text-to-speech applications. It contains words broken out in to their individual phonemes, but also syllable counts for each word. With the CMU Dictionary I managed to reliably identify the syllable count of most words, and where I couldn’t I fell back to the heuristic rules I was using before.  To improve the algorithm I added in support for acronyms (i.e. WTF is 5 syllables, dou/ble/u/tee/eff), numbers, and other common proper nouns that aren’t found in the syllable dictionary. Also, I stripped URLs from the input, because they don’t make much sense in Haiku form.

Finally, I needed to determine where to put the line breaks. A tweet might have 17 syllables, but not fit the format because it cant be broken up in to the 5-7-5 format. Only tweets that could be broken up after 5-7-5 syllables would pass the test. Cool, now I can identify Haikus! I hooked this test up to Twitter and started following some regular tweeters, and it worked! Here is one of the first:

If you like the smell / of burning brakes, I strongly / suggest Ford Focus.

I kept the bot up for a week, having it check for existing haiku’s against followers tweets, but I kept thinking some of these tweets are so close! If only they could be rewritten a little they would work quite well. It was around this time a co-worker showed me TranslationParty, which takes your English text and uses Google Translate to go back and forth between English and Japanese, which changes the input slightly each time. For example, if I input part of the previous sentence:

It takes your English text and uses Google Translate to go back and forth between English and Japanese.

The result after 11 translations back and forth becomes:

In other words, Google is being used in the text between English and Japanese text in English.

I realized I could use this translation and each one of these intermediary steps as a slight variation on the original input, and it would change the syllable count just enough to transform a near-haiku in to a proper one. Plus, it adds a Confucious kind of quality to the tweets that often results in some pretty funny results. I coded up a simple version of TranslationParty that used Google Translate to flip back and forth between five languages that I found gave interesting results: Vietnamese, Latvian, Arabic, Chinese and Japanese. Here is the result of a Vietnamese translation chain performed on an actual tweet (with the syllable count on the left):

  • (20): Abe Lincoln Won A Wrestling Match and other interesting presidential facts
  • (20): Abe Lincoln Won A Wrestling match and the other presidential interesting facts
  • (18): Abe Lincoln won a game and Wrestling events presidential interesting
  • (15): Abe Lincoln won a game and fun events ~ Wrestling president
  • (15): Abe Lincoln won a game and fun events ~ Wrestling president

Pretty good! However, it didn’t yield a haiku. We shrunk right past our target of 17 syllables and ended up at 15. Using Google Translate to give variations on tweets did up the percentage of tweets that could be turned in to Haikus, but I thought I could do one better. If no haiku has been found after the initial check and the translation chain, I decided to try parsing out nouns and verbs and using a thesaurus to come up with words of different syllable counts. Using a Natural Language Parser from Stanford to parse out the nouns and verbs, and BigHugeThesaurus to retrieve a list of synonyms for any word, I was able to get word substitution working. Here is the output run on the previous tweet.

Nouns: won, game
Synonyms for (won): South Korean won, North Korean won, North Korean monetary unit, South Korean monetary unit
Synonyms for (game): plot, secret plan, biz, activity, animal, animate being, beast, brute, business, caper, competition, contest, creature, diversion, fauna, frolic, gambol, game equipment, job, line, line of work, meat, occupation, play, recreation, romp, scheme, score, strategy

Substitution found: Abe Lincoln won a line of work and fun events ~ Wrestling president

That did it! Here is our final Haiku:

Abe Lincoln won a / line of work and fun events / ~ Wrestling president

Granted, it is kind of far from the original input. It feels like but an English as a second language student writing poetry. But sometimes a little Engrish can be pretty funny, and the bot will only manipulate a tweet this much when there are no better options.

Here are some of my favourites:

Sometimes no matter / how hard you try, the bridges / burn themselves for you. #haiku by: @nshunter
Wait … or tram, or you / can do to derail what they / are? I think you are. #haiku by: @PatriciaAllison
In an ideal world, / Colin Firth has become a / woman with a cat. #haiku by: @sween

There you have it, a haiku writing Twitter bot. Follow YouInHaiku and it will automatically follow you back and reply back to you and unintentional Haikus you write. Follow it and start writing some poetry!

P.S. If anyone has a server with a virtual memory limit of more than 100MB, YouInHaiku is looking for a permanent home! The shared hosting server this blog is running on doesn’t have the horsepower (read, memory) to handle loading the JVM, a word dictionary, and all the required libraries for this project so for now YouInHaiku only runs locally whenever I hit “run”.

This entry was posted on Tuesday, February 22nd, 2011 at 11:56 am and is filed under Other. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.