Over a million developers have joined DZone.

Fun with Couchbase and Markov Chains

DZone's Guide to

Fun with Couchbase and Markov Chains

· Database Zone
Free Resource

Check out the IT Market Clock report for recommendations on how to consolidate and replace legacy databases. Brought to you in partnership with MariaDB.

I’ve been hearing about Markov chains for long enough – it was time that I learned more about them and develop a simple fun markov chain application. I’m sure that you don’t want to get bogged down by the mathematical details of Markov Chains - learning by building an application is where all the fun is!

In this blog, we will show how to build an application “Marky” that uses Markov chains to generate nonsensical tweets based on your twitter history. It uses Couchbase Server to store and process the data to generate these tweets.

Marky uses Couchbase Server views to process data
Marky’s map function is :
function (doc, meta) {
   if(doc.body) {
       var words = doc.body.split(/\s+/);
       if (words.length >= 1) {
           emit([null, words[0]], 1);
       for(var i = 0; i < (words.length - 1); i++) {
           var pair = [words[i], words[i+1]];
           emit(pair, 1);

At a high-level, it splits text up into smaller chunks using a sliding window over 2 consecutive words and tries to regroup these chunks in correct order to form sentences based on a statistical weight. In the end, you get some nonsensical text that is fun to read.

For example : Given the input text “In this blog, we will show you how to build an application”, it will emit the Key,Value pairs -

Key                   Value

[null,"In"]           1
["In","this"]         1
["this","blog,"]      1
["blog,","we"]        1
["we","will"]         1
["will","show"]       1
["show","you"]        1
["you","how"]         1
["how","to"]          1
["to","build"]        1
["build","an"]        1
["an","application"]  1

To generate a word, we query the view using the last word we output. For example, to get candidates for a word to follow “the”, we use the query parameters startkey=["the"]&endkey=["the",{}]&group_level=2&reduce=true

This will get all the word pairs we outputted that start with “the”, group together pairs that are the same, and run the view’s reduce function on each group. Marky uses the built in reduce _sum, which will add together the values it is given. Running this on the database backing dkatz_ebooks yields:

Key                         Value
["the","#1"]                1
["the","100"]               1
["the","2"]                 1
["the","ability"]           3
["the","absolute"]          1
["the","answer"]            1
["the","app"]               1
["the","application"]       1
["the","area,"]             1
["the","background."]       1

To pick the word to output after “the”, we choose a word that follows it at random, but weight our choice based on the frequency of the word pair appearing in the input. That means we give “ability” has a 3/12 or 25% chance of being chosen here, where the rest of the words each have a 1/12 chance of being chosen or 8.3%.

Since at the beginning of a sentence, we pair the first word with null (for example: [null, “In”] in the earlier example), we can do the same query with null to begin a new output and get words likely to start a thought, or tweet, or whatever our input was. We also need to do this if we get unlucky and don’t get any candidate words back from the first view query. This could happen if the word in the query had only ever shown up at the end of the input texts we processed.

Marky Application

Marky uses a simple clojure wrapper built by the community. To setup marky, create a marky-config.clj file and point it to your Couchbase Server cluster and twitter account. Add some seed data, twitter user accounts or atom feeds and you're ready to launch the app.

{:bucket "default"
:pass ""
:cburl "http://localhost:8091/"
:twitter {:app-key "XXXXXXXXX"
          :app-secret "XXXXXXXXXX"
          :user-token "XXXXXXXX"
          :user-secret "XXXXXXXX"}
[; :period, :after are in seconds, :ttl is in days.
 {:type :twitter :user "user-handle1" :period 3600 :ttl 60}
 {:type :twitter :user "user-handle2" :period 3600 :ttl 60}
 {:type :send-tweet :period 3600 :after 600}
 {:type :atom :url "http://some-domain/rssfeed.php" :period 86400 :ttl 60}]}

Here are some fun Marky tweets -

Want To Get Marky?

You can download the Marky source code here
You can also contribute to the clojure wrapper project here

Have Fun!


Thanks to Aaron for putting together the code in clojure.

Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.


Published at DZone with permission of Baxter Denney. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}