Here are just notes from my data mining class which I began to consolidate here in my blog as a way to assimilate the lessons.

The Apriori algorithm is a basic method for finding frequent itemsets. The latter is used to generate association rules with high confidence and high interest.

Here is my summary of it along with a running example. The following set of baskets will be used:

D = [ ('milk', 'cheese', 'tuna'), ('cheese', 'mushroom'), ('cheese', 'eggs'), ('milk', 'cheese', 'mushroom'), ('milk', 'eggs'), ('cheese', 'eggs'), ('milk', 'cheese', 'mushroom', 'tuna'), ('milk', 'cheese', 'eggs') ]

**Some definitions: **

** ** – is the universal set of items. In the example above, the universal set would be {milk, cheese, tuna, mushroom, eggs}.

– is like a k-combination of . Like because items in this set should have frequent (k-1, k-2,…1)-itemsets.

Now, the apriori algorithm.

1. Generate by cross joining the itemsets of among themselves.

Cross joining two setsA cross join between two k-item sets is a union of those two sets which results in a k+1 itemset. However, the join only happens if and only if the first k-1 items of both sets are equal.

For example:

no join

If , simply list all 1 itemsets.

one_itemset = ['milk', 'cheese', 'eggs', 'mushroom', 'tuna']

2. Generate , the frequent itsemsets by counting the number of times each element in occurs in . If an element in or the support threshold, it is qualified to be a member of . For and our example above for is

c1 = {'cheese': 7, 'tuna': 2, 'eggs': 6, 'mushroom': 2, 'milk': 6}

3. Repeat the process for until no frequent itemsets are found.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}