Over a million developers have joined DZone. {{announcement.body}}
{{announcement.title}}

DZone's Guide to

# The Apriori Algorithm

· ·
Free Resource

Comment (0)

Save
{{ articles.views | formatCount}} Views

Here are just notes from my data mining class which I began to consolidate here in my blog as a way to assimilate the lessons.

The Apriori algorithm is a basic method for finding frequent itemsets. The latter is used to generate association rules with high confidence and high interest.

Here is my summary of it along with a running example. The following set of baskets will be used:

D = [ ('milk', 'cheese', 'tuna'),
('cheese', 'mushroom'),
('cheese', 'eggs'),
('milk', 'cheese', 'mushroom'),
('milk', 'eggs'),
('cheese', 'eggs'),
('milk', 'cheese', 'mushroom', 'tuna'),
('milk', 'cheese', 'eggs') ]

Some definitions: $I$ – is the universal set of items. In the example above, the universal set would be {milk, cheese, tuna, mushroom, eggs}. $C_{k}$ – is like a k-combination of $I$. Like because items in this set should have frequent (k-1, k-2,…1)-itemsets.

Now, the apriori algorithm.

1. Generate $C_{k}$ by cross joining the itemsets of $L_{k-1}$ among themselves.

Cross joining two sets

A cross join between two k-item sets is a union of those two sets which results in a k+1 itemset. However, the join only happens if and only if the first k-1 items of both sets are equal.

For example: $[1,2] |x| [1,3] = [1,2,3]$ $[1,3] |x| [1,4] = [1,3,4]$ $[2,3] |x| [1,5] =$ no join

If $k=1$, simply list all 1 itemsets.

one_itemset = ['milk', 'cheese', 'eggs', 'mushroom', 'tuna']

2. Generate $L_{k}$, the frequent itsemsets by counting the number of times each element in $C_{k}$ occurs in $D$. If an element in $C_{k} \geq S$ or the support threshold, it is qualified to be a member of $L_{k}$. For $k=1$ and our example above for $L_{1}$ is

c1 = {'cheese': 7,
'tuna': 2,
'eggs': 6,
'mushroom': 2,
'milk': 6}

3. Repeat the process for $k \leq |I|$ until no frequent itemsets are found.

Topics:

Comment (0)

Save
{{ articles.views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

# {{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}