Update 8/20/2014 8:30 AM: I have deactivated the script for this Twitter bot. It was fun, and the process is below if you want to read more. But, the Twitter feed is now inactive.
I've been fascinated by the proliferation of non-spammy Twitter bots in the last year. Chatbots have been around for a long time (remember SmarterChild on AIM? Anyone?) and they've migrated to Twitter. One of the more famous (and in the end, decidedly disappointing) chatbots on Twitter was @horse_ebooks. It would tweet non sequiturs at various intervals and currently, even though the account is no longer active, it has 203,000 followers. Twitter isn't just for things with fingers anymore.
I think bots are fun because we can make them close to sounding normal, yet slightly...off. A turn of phrase is correct, but it doesn't sit right. It's a look into what we could come up with ourselves, but weren't quite clever enough to pull off. In fact, "chatterbots" have been around since the 1990's and there is an annual competition each year for the Loebner Prize, which is based on the Turing Test for true artificial intelligence.
Twitter bots are subverting the way the larger population thinks about online communication and how computer scripts running at intervals can become not only really convincing, but incredibly entertaining parts of our daily experience.
I started by building a simple bot which would search for "Shakespeare" on Twitter using the Twython library from GitHub. Essentially, it lets you plug into Twitters REST 1.1 API using a python script. You can check out @ShakeTheBard to see some of the early tweets. That wasn't much fun, though, because it mostly pulled quotes from plays. So, I took it one step farther.
The Markov Chain is an algorithm which can be used to generate random sequences (in this case, sentences) based on probability. So, in essence, it looks for a group of words - two or three at a time - and then determines a likely follow-up based on the frequency of those words and the text following them in the sampel. From StackOverflow:
- Split a body of text into tokens (words, punctuation).
- Build a frequency table. This is a data structure where for every word in your body of text, you have an entry (key). This key is mapped to another data structure that is basically a list of all the words that follow this word (the key) along with its frequency.
- Generate the Markov Chain. To do this, you select a starting point (a key from your frequency table) and then you randomly select another state to go to (the next word). The next word you choose, is dependent on its frequency (so some words are more probable than others). After that, you use this new word as the key and start over.
Sounds confusing, because it is.
I have a text document with every sonnet Shakespeare wrote. All 154. So each time the program runs, it chooses a starting point at random and generates a unique line of poetry based on the frequency of that choice as it goes through the algorithm. Finally, it tweets that line.
A lot of people use their own Twitter archive to make bots of alternate-reality selves, but I haven't gotten that deep into using the Twython library and pairing it with the Markov Chain library I found. So, for now, Bill is tweeting sonnet mashups. Some are pretty good, others not so much. But that's the fun.
One of my favorites from testing (unfortunately, not tweeted) was:
Upon thyself thy thought, that thou shouldst depart.
In other words, "I thought to myself: 'I'd better scram.'" Shakespeare is rolling over in his grave right now.
I see this as a 21st century version of giving 100 monkeys typewriters and infinite time to reproduce Shakespeare's work. But, I don't have 100 monkeys, and typewriters are inefficient. I'll stick with the Pi.
I'm not expecting a ton of followers, and I'm not even sure I'll leave the account active for any significant period of time. There is a lot of optimization I could do in the code, but I'm just exploring at this point. I'm not planning on posting the script, but if you want to see it, leave a note in the comments and I'll get a link up.