Over a million developers have joined DZone.

The Apriori Algorithm

DZone's Guide to

The Apriori Algorithm

· ·
Free Resource

Here are just notes from my data mining class which I began to consolidate here in my blog as a way to assimilate the lessons.

The Apriori algorithm is a basic method for finding frequent itemsets. The latter is used to generate association rules with high confidence and high interest.

Here is my summary of it along with a running example. The following set of baskets will be used:

D = [ ('milk', 'cheese', 'tuna'),
      ('cheese', 'mushroom'),
      ('cheese', 'eggs'),
      ('milk', 'cheese', 'mushroom'),
      ('milk', 'eggs'),
      ('cheese', 'eggs'),
      ('milk', 'cheese', 'mushroom', 'tuna'), 
      ('milk', 'cheese', 'eggs') ]

Some definitions:

I – is the universal set of items. In the example above, the universal set would be {milk, cheese, tuna, mushroom, eggs}.

C_{k} – is like a k-combination of I. Like because items in this set should have frequent (k-1, k-2,…1)-itemsets.

Now, the apriori algorithm.

1. Generate C_{k} by cross joining the itemsets of L_{k-1} among themselves.

Cross joining two sets

A cross join between two k-item sets is a union of those two sets which results in a k+1 itemset. However, the join only happens if and only if the first k-1 items of both sets are equal.

For example:

[1,2] |x| [1,3] = [1,2,3]
[1,3] |x| [1,4] = [1,3,4]
[2,3] |x| [1,5] = no join

If k=1, simply list all 1 itemsets.

one_itemset = ['milk', 'cheese', 'eggs', 'mushroom', 'tuna']

2. Generate L_{k}, the frequent itsemsets by counting the number of times each element in C_{k} occurs in D. If an element in C_{k} \geq S or the support threshold, it is qualified to be a member of L_{k}. For k=1 and our example above for L_{1} is

c1 = {'cheese': 7, 
      'tuna': 2, 
      'eggs': 6, 
      'mushroom': 2, 
      'milk': 6}

3. Repeat the process for k \leq |I| until no frequent itemsets are found.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}