{{announcement.body}}
{{announcement.title}}

# Sequence Alignment Algorithm

DZone 's Guide to

# Sequence Alignment Algorithm

### We look at how to align two sequences using the sequence alignment algorithms of Needleman-Wunsch and Hirschberg.

· Big Data Zone ·
Free Resource

Comment (1)

Save
{{ articles[0].views | formatCount}} Views

In my previous post, I illustrated the Levenshtein edit distance by comparing the opening paragraphs of Finnegans Wake by James Joyce and a parody by Adam Roberts.

In this post, I'll show how to align two sequences using the sequence alignment algorithms of Needleman-Wunsch and Hirschberg. These algorithms can be used to compare any sequences, though they are more often used to compare DNA sequences than impenetrable novels and parodies.

I'll be using Gang Li's implementation of these algorithms, available on GitHub. I believe the two algorithms are supposed to produce the same results, that Hirschberg's algorithm is a more space-efficient implementation of the Needleman-Wunsch algorithm, though the two algorithms below produce slightly different results. I'll give the output of Hirschberg's algorithm.

Li's alignment code uses lists of characters for input and output. I wrote a simple wrapper to take in strings and output strings.

``````    from alignment import Needleman, Hirschberg

def compare(str1, str2):
seq1 = list(str1)
seq2 = list(str2)
for algorithm in [Needleman(), Hirschberg()]:
a, b = algorithm.align(seq1, seq2)
print("".join(a))
print("".join(b))
print()``````

The code inserts vertical bars to indicate spaces added for alignment. Here's the result of using the Needleman-Wunsch algorithm on the opening paragraphs of Finnegans Wake and the parody Finnegans Ewok.

``````    |||riverrun, past Ev|e| and Adam'||||s,
mov|i|er|un, past ||new and |||||hopes,

from swe|rv||e of shore|||| to bend of
from s||tr|ike of |||||back to bend of

b|||ay, brings us by a commodius
|jeday, brings us by a commodius

vic|u||s of recirculation back to
|||lucas of recirculation back to

H|owth Ca||stle|||| and E|nvi||r|ons.
|fo||||||rest||moon and |en||dor.||||``````

I mentioned in my previous post that I could compare the first four paragraphs easily, but I had some trouble aligning the fifth paragraphs. The fifth paragraphs of each version start out quite similar:

``````    Bygme||ster Fi|nnega||n, of the
Bygm|onster ||Ann||akin, of the

Stutte||r|||||||ing Hand, f|re|emen'|s
||||||Throatchokin| Hand, for|cemen|'s

mau-rer, lived in the broadest way
mau-rer, lived in the broadest way

immarginable in his rushlit
immarginable in his rushlit

toofar-|||back for messuages before
toofar| - back for messuages before``````

But then Roberts takes the liberty of skipping over a large section of the original. This is what I suspected by looking at the two texts, but Hirschberg's algorithm makes the edits obvious by showing two long sequences of vertical bars, one about 600 characters long and another about 90 characters long.

Topics:
big data ,sequence alignment algorithm ,python ,tutorial

Comment (1)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of John Cook , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.