Over a million developers have joined DZone.

Algorithm of the Week: Generate Music Algorithmically

DZone's Guide to

Algorithm of the Week: Generate Music Algorithmically

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

What is a Markov Chain?

The algorithm of the week is a Markov Chain. Using this technique you leverage a little bit of probability to do some light machine learning. In this case, input a song and have the computer create an original work based off the patterns you’ve taught it.

I’ve attached a sample song. Give it a listen. I think you’ll be somewhat surprised that that’s a completely computer generated song. Only somewhat… it’s not very good, but it definitely isn’t predictable and yet isn’t noise either.

Click here to open the example as a .wav file

How’d You Do That?!

There’s a fairly simple computer science technique known as a “Markov Chain”. Don’t let the whole reference to computer science fool you, it’s really not tough to grasp. Basically I created a table of numbers that answers the following question: When note X was played what percentage of the time were the other notes played next? So just imagine a table with all of the notes from a to g# laid out on top (these are the notes that we last played) and vertical axis is all of the same notes but this axis represents the probability that that particular note was played next.

Here’s a sample of what my program generates:   

 a a#  b  c  c# d  d# e  f  f# g  g#  
a [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],   
a# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
b  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],   
c  [0, 0, 0, 6, 0, 1, 0, 0, 0, 0, 2, 0],   
c# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],   
d  [0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0],   
d# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
e  [0, 0, 0, 1, 0, 2, 0, 3, 1, 0, 0, 0],   
e# [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],   
f  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],   
f# [0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 2, 0],   
g  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
g# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

This is called an “adjancency list”. I’ve populated it with the music to Twinkle Twinkle. Let’s answer a question using it. How often does playing a C note lead to us playing another C note?

First we look up C on the left column and follow the row to the column labeled C. We get the answer 6. In order to figure out how often we play C given that we just played C though we need to know how many times we played a C note first. Adding up all of the values in the row gets us 9. So 6/9’s of the time (or 2/3) playing a C note will lead to us playing another C note.

Here’s the timings table (1, 1/2, 1/4, 1/8, 1/16 notes): 

    1  2  4   8 16  
1  [0, 0, 0,  0, 0], 
2  [0, 0, 0,  1, 0], 
4  [0, 0, 3,  5, 0], 
8  [0, 2, 4, 11, 0], 
16 [0, 0, 0,  0, 0]	

But that’s it, that’s the magic that drives the whole process. As you can see, I’m actually not storing percentages but instead just the count of the number of times a note led to a different note. It works out to be the same thing in the end.


Of course we need to see some code! As usual, I’ll start with the highest level and drill into it in more detail.

This portion of the code records the notes for “Twinkle, Twinkle” into the music learner I create. Then I tell it to give me 100 notes based on what it’s learned:

class MusicMatrix:
    def __init__(self):
        self._previous_note = None
        self._markov = MarkovBuilder(["a", "a#", "b", "c", "c#", "d", "d#", "e", "f", "f#", "g", "g#"])
        self._timings = MarkovBuilder([1, 2, 4, 8, 16])

    def add(self, to_note):
        """Add a path from a note to another note. Re-adding a path between notes will increase the associated weight."""
        if(self._previous_note is None):
            self._previous_note = to_note
        from_note = self._previous_note
        self._markov.add(from_note[0], to_note[0])
        self._timings.add(from_note[1], to_note[1])
        self._previous_note = to_note
    def next_note(self, from_note):
        return [self._markov.next_value(from_note[0]), self._timings.next_value(from_note[1])]

# Playing it comes next :)
#test = [['c',4], ['e',4], ['g',4], ['c5',1]]
#pysynth.make_wav(test, fn = "test.wav")

musicLearner = MusicMatrix()

# Input the melody of Row, Row, Row Your Boat
# The MusicMatrix will automatically use this to 
# model our own song after it.
musicLearner.add(["c", 4])
musicLearner.add(["c", 4])
musicLearner.add(["c", 4])
musicLearner.add(["d", 8])
musicLearner.add(["e", 4])
musicLearner.add(["e", 4])
musicLearner.add(["d", 8])
musicLearner.add(["e", 4])
musicLearner.add(["f", 8])
musicLearner.add(["g", 2])

musicLearner.add(["c", 8])
musicLearner.add(["c", 8])
musicLearner.add(["c", 8])

musicLearner.add(["g", 8])
musicLearner.add(["g", 8])
musicLearner.add(["g", 8])

musicLearner.add(["e", 8])
musicLearner.add(["e", 8])
musicLearner.add(["e", 8])

musicLearner.add(["c", 8])
musicLearner.add(["c", 8])
musicLearner.add(["c", 8])

musicLearner.add(["g", 4])
musicLearner.add(["f", 8])
musicLearner.add(["e", 4])
musicLearner.add(["d", 8])
musicLearner.add(["c", 2])

random_score = []
current_note = ["c", 4]
for i in range(0,100):
    print current_note[0] + ", " + str(current_note[1])
    current_note = musicLearner.next_note(current_note)

pysynth.make_wav(random_score, fn = "first_score.wav")

The MarkovBuilder is where the algorithm is hiding. Notice how the MusicMatrix is keeping track of the previous note for us? The MarkovBuilder needs to be told explicitly which notes connect to which.

import random

class MarkovBuilder:
    def __init__(self, value_list):
        self._values_added = 0
        self._reverse_value_lookup = value_list
        self._value_lookup = {}
        for i in range(0, len(value_list)):
            self._value_lookup[value_list[i]] = i
        # Initialize our adjacency matrix with the initial
        # probabilities for note transitions.
        self._matrix=[[0 for x in range(0,len(value_list))] for i in range(0,len(value_list))]

    def add(self, from_value, to_value):
        """Add a path from a note to another note. Re-adding a path between notes will increase the associated weight."""
        value = self._value_lookup
        self._matrix[value[from_value]][value[to_value]] += 1
        self._values_added = self._values_added + 1
    def next_value(self, from_value):
        value = self._value_lookup[from_value]
        value_counts = self._matrix[value]
        value_index = self.randomly_choose(value_counts)
        if(value_index < 0):
            raise RuntimeError, "Non-existent value selected."
            return self._reverse_value_lookup[value_index]
    def randomly_choose(self, choice_counts):
        """Given an array of counts, returns the index that was randomly chosen"""
        counted_sum = 0
        count_sum = sum(choice_counts)
        selected_count = random.randrange(1, count_sum + 1)
        for index in range(0, len(choice_counts)):
            counted_sum += choice_counts[index]
            if(counted_sum >= selected_count):
                return index
        raise RuntimeError, "Impossible value selection made. BAD!"

Again this class’s sole reason to exist is to track how one value we give it leads to another, and it facilitates picking values to give us that are informed by what has happened in the past. It’s important to note that we don’t just pick the value that occurred the most. If we did that it would make our program much less interesting. Instead we use the “randomly_choose” function to choose a next value in a weighted manner (very similar to Bayes Theorem for those of you wondering).


So to summarize, my program was able to generate the music because I fed it a sample musical score and it figured out what percentage of the time a c note led to another note. Here’s a step by step run through:

  • Give program a note and a timing. - When I give it a second note/timing it notes in it’s table that the first note led to the 2nd note one time. It also notes that the first timing led to this 2nd timing one time. (note I don’t attempt to relate notes/timings, it’s not important surprisingly).
  • Enter a third note. - The program then notes in its table that the 2nd note led to the 3rd note one time. - Continue ad inifinitum So that’s how we set the system up. Next this is how we get the program to come up with its own song:
  • Choose some random note/timing to start. - Ask the computer to suggest what a good note/timing would be to follow those. - print out the note/timing (in case it’s a work of genius ;)
  • play each note using the python library I’m using, pysynth.

Here’s a link to my git repo with all of my Python code: http://github.com/jcbozonier/MarkovMusic/tree/master

That’s all for now!

Disclaimer: The opinions expressed here represent my own and not those of my employer.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}