{{ !articles[0].partner.isSponsoringArticle ? "Platinum" : "Portal" }} Partner
architects,solr,tutorial,patterns,algorithms,data mining

The Apriori Algorithm

Here are just notes from my data mining class which I began to consolidate here in my blog as a way to assimilate the lessons.

The Apriori algorithm is a basic method for finding frequent itemsets. The latter is used to generate association rules with high confidence and high interest.

Here is my summary of it along with a running example. The following set of baskets will be used:

D = [ ('milk', 'cheese', 'tuna'),
      ('cheese', 'mushroom'),
      ('cheese', 'eggs'),
      ('milk', 'cheese', 'mushroom'),
      ('milk', 'eggs'),
      ('cheese', 'eggs'),
      ('milk', 'cheese', 'mushroom', 'tuna'), 
      ('milk', 'cheese', 'eggs') ]

Some definitions:

I – is the universal set of items. In the example above, the universal set would be {milk, cheese, tuna, mushroom, eggs}.

C_{k} – is like a k-combination of I. Like because items in this set should have frequent (k-1, k-2,…1)-itemsets.

Now, the apriori algorithm.

1. Generate C_{k} by cross joining the itemsets of L_{k-1} among themselves.

Cross joining two sets

A cross join between two k-item sets is a union of those two sets which results in a k+1 itemset. However, the join only happens if and only if the first k-1 items of both sets are equal.

For example:

[1,2] |x| [1,3] = [1,2,3]
[1,3] |x| [1,4] = [1,3,4]
[2,3] |x| [1,5] = no join

If k=1, simply list all 1 itemsets.

one_itemset = ['milk', 'cheese', 'eggs', 'mushroom', 'tuna']

2. Generate L_{k}, the frequent itsemsets by counting the number of times each element in C_{k} occurs in D. If an element in C_{k} \geq S or the support threshold, it is qualified to be a member of L_{k}. For k=1 and our example above for L_{1} is

c1 = {'cheese': 7, 
      'tuna': 2, 
      'eggs': 6, 
      'mushroom': 2, 
      'milk': 6}

3. Repeat the process for k \leq |I| until no frequent itemsets are found.

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks