Over a million developers have joined DZone.

The Apriori Algorithm

·

Here are just notes from my data mining class which I began to consolidate here in my blog as a way to assimilate the lessons.

The Apriori algorithm is a basic method for finding frequent itemsets. The latter is used to generate association rules with high confidence and high interest.

Here is my summary of it along with a running example. The following set of baskets will be used:

D = [ ('milk', 'cheese', 'tuna'),
      ('cheese', 'mushroom'),
      ('cheese', 'eggs'),
      ('milk', 'cheese', 'mushroom'),
      ('milk', 'eggs'),
      ('cheese', 'eggs'),
      ('milk', 'cheese', 'mushroom', 'tuna'), 
      ('milk', 'cheese', 'eggs') ]

Some definitions:

I – is the universal set of items. In the example above, the universal set would be {milk, cheese, tuna, mushroom, eggs}.

C_{k} – is like a k-combination of I. Like because items in this set should have frequent (k-1, k-2,…1)-itemsets.

Now, the apriori algorithm.

1. Generate C_{k} by cross joining the itemsets of L_{k-1} among themselves.

Cross joining two sets

A cross join between two k-item sets is a union of those two sets which results in a k+1 itemset. However, the join only happens if and only if the first k-1 items of both sets are equal.

For example:

[1,2] |x| [1,3] = [1,2,3]
[1,3] |x| [1,4] = [1,3,4]
[2,3] |x| [1,5] = no join


If k=1, simply list all 1 itemsets.

one_itemset = ['milk', 'cheese', 'eggs', 'mushroom', 'tuna']

2. Generate L_{k}, the frequent itsemsets by counting the number of times each element in C_{k} occurs in D. If an element in C_{k} \geq S or the support threshold, it is qualified to be a member of L_{k}. For k=1 and our example above for L_{1} is

c1 = {'cheese': 7, 
      'tuna': 2, 
      'eggs': 6, 
      'mushroom': 2, 
      'milk': 6}

3. Repeat the process for k \leq |I| until no frequent itemsets are found.

Topics:

Published at DZone with permission of Jose Asuncion, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}