Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Haskell: An Impressively Non-Performant Union Find

· Performance Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Learn how error monitoring with Sentry closes the gap between the product team and your customers. With Sentry, you can focus on what you do best: building and scaling software that makes your users’ lives better.

I’ve spent the best part of the last day debugging a clustering algorithm I wrote as part of the Algorithms 2 course, eventually coming to the conclusion that the union find data structure I was using wasn’t working as expected.

In our algorithm we’re trying to group together points which are ‘close’ to each other and the data structure is particular useful for doing that.

To paraphrase from my previous post about how we use the union find data structure:

We start out with n connected components i.e. every point is in its own connected component.

We then merge these components together as calculate the neighbours of each point until we’ve iterated through all the points and have grouped all the points into the appropriate components.

I came across 3 libraries which implement this data structure – union-find, equivalence and persistent-equivalence.

union-find seemed like it had the easiest API to understand so I plugged it into my program only to eventually realise that it wasn’t putting the points into components as I expected.

I eventually narrowed the problem down to the following example:

```> let uf = emptyEquivalence (0,9)
[(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)]

> components \$ equate 0 1 uf
[(0,0),(1,0),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)]

> components \$ equate 8 9 \$ equate 0 1 \$ uf
[(0,0),(1,0),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,8)]

> components \$ equate 0 8 \$ equate 8 9 \$ equate 0 1 \$ uf
[(0,0),(1,0),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,0),(9,8)]```

We start out with a union-find where every point is in its own component. The next line puts points ’0′ and ’1′ into the same component which it does by making indexes ’0′ and ’1′ of the array both have the same value, in this case 0, which is known as the component’s leader.

We continue doing that for points ’8′ and ’9′ which works fine and our union find now consists of 8 components – the ones with leaders 8 & 0 which have two elements and then ones with leaders 2,3,4,5,6 & 7 which only contain themselves.

Things go wrong on our next step where we try to join nodes ’0′ and ’8′. As I understand it what should happen here is that all the points connected to either ’0′ or ’8′ should end up in the same component so we should have a component containing points ’0′, ’1′, ’8′ and ’9′ but ’9′ has been missed off in this case.

The implementation is deliberately written to work like this so I thought I’d try writing my own version based on the following Ruby version:

```class UnionFind
def initialize(n)
end

def connected?(id1,id2)
end

def union(id1,id2)
end
end```

This is my Haskell equivalent which I adapted from the union-find one that I mentioned above:

```module Leaders (UnionSet, create, components, numberOfComponents, indexes, inSameComponent, union) where

import Control.Concurrent.MVar
import Data.Array.Diff as ArrayDiff
import Data.IORef
import qualified Data.List
import Data.Maybe
import System.IO.Unsafe
import qualified Data.Set as Set

arrayFrom :: (IArray a e, Ix i) => (i,i) -> (i -> e) -> a i e
arrayFrom rng f = array rng [ (x, f x) | x <- range rng ]

ref :: a -> IORef a
ref x = unsafePerformIO (newIORef x)

data UnionSet i = UnionSet { leaders :: IORef (DiffArray i i) }

create :: Ix i => (i, i) -> UnionSet i
create is = UnionSet (ref (arrayFrom is id))

extractComponents :: Ix i => DiffArray i i -> [(i, i)]
extractComponents  = Set.toList . Set.fromList . ArrayDiff.assocs

components :: Ix i => UnionSet i -> [(i,i)]
components (UnionSet leaders) = unsafePerformIO \$ do
return (extractComponents l)

numberOfComponents :: Ix i => UnionSet i -> Int
numberOfComponents (UnionSet leaders) = unsafePerformIO \$ do
return (length \$ extractComponents l)

indexes :: Ix i => UnionSet i -> [(i,i)]
indexes (UnionSet leaders) = unsafePerformIO \$ do
return (ArrayDiff.assocs l)

inSameComponent :: Ix i => UnionSet i -> i -> i -> Bool
inSameComponent (UnionSet leaders) x y = unsafePerformIO \$ do
return (l ! x == l ! y)

union x y (UnionSet leaders)  = unsafePerformIO \$ do
let leader1 = ls ! x
newLeaders = map (\(index, value) -> (index, leader2)) . filter (\(index, value) -> value == leader1) \$ assocs ls

We can recreate the above example like so:

```> indexes \$ Leaders.union 0 8 \$ Leaders.union 8 9 \$ Leaders.union 0 1 \$ create (0,9)
[(0,9),(1,9),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,9),(9,9)]```

Unfortunately it takes 44 seconds to execute which is mainly down to the call to assocs on line 46. assocs gives us a list of all the indexes and their corresponding values which we use to work out which indexes need to be updated with a new leader.

The rest of the code is mostly boiler plate around getting the array out of the IORef. The IORef allows us to have a mutable array in this instance. There is a page on the c2 wiki which explains how to use IORef in more detail.

Although using the DiffArray allows us to provide a pure external interface around its use, it is known to be 10-100x slower than an MArray.

I’ve been playing around with a version of the union-find data structure which makes use of an MArray instead and has decreased the execution time to 34 seconds.

Unless anyone has any ideas for how I can get this to perform more quickly I’m thinking that perhaps an array isn’t a good choice of underlying data structure at least when using Haskell.

What’s the best way to boost the efficiency of your product team and ship with confidence? Check out this ebook to learn how Sentry's real-time error monitoring helps developers stay in their workflow to fix bugs before the user even knows there’s a problem.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.