Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

MachineX: Frequent Itemset Generation With the FP-Growth Algorithm

DZone's Guide to

MachineX: Frequent Itemset Generation With the FP-Growth Algorithm

In this article, we discuss the FP-Growth algorithm, which uses FP-Tree to extract frequent itemsets in the given dataset.

· AI Zone ·
Free Resource

Did you know that 50- 80% of your enterprise business processes can be automated with AssistEdge?  Identify processes, deploy bots and scale effortlessly with AssistEdge.

In our previous blog, MachineX: Understanding FP-Tree Construction, we discussed the FP-Tree and its construction. In this blog, we will be discussing the FP-Growth algorithm, which uses FP-Tree to extract frequent itemsets in the given dataset.

FP-growth is an algorithm that generates frequent itemsets from an FP-tree by exploring the tree in a bottom-up fashion. We will be picking up the example we used in the previous blog while constructing the FP-Tree.

The final FP-Tree that was constructed in our previous blog is shown in the below figure.

The algorithm will iterate the header table in a reverse order. So, first of all, it will find all the frequent items ending in p, then m, b, a, c, and finally, f. Since every transaction is mapped onto a path in the FP-tree, we can derive the frequent itemsets ending with a particular item, say p, by examining only the paths containing node p. These paths can be accessed rapidly using the pointers associated with node p.

FP-growth finds all the frequent itemsets ending with a particular suffix by employing a divide-and-conquer strategy to split the problem into smaller subproblems. For example, suppose we are interested in finding all frequent itemsets ending in p. To do this, we must first check whether the itemset {p} itself is frequent. If it is frequent, we consider the subproblem of finding frequent itemsets ending in mp, followed by bp, ap, cp, and fp. In turn, each of these subproblems is further decomposed into smaller subproblems. By merging the solutions obtained from the subproblems, all the frequent itemsets ending in p can be found. This divide-and-conquer approach is the key strategy employed by the FP-growth algorithm.

So, let's dive into the steps to follow to find the frequent itemsets in the FP-Tree.

Step 1 — Header table would be iterated in a reverse order, so first, those frequent itemsets would be searched for which end with the item p. To do that, we will gather all the paths ending in node p. Now, the thing to remember here is that header table already consists of only frequent items, so p itself is frequent and we can expect itemsets ending with p to be frequent as well. These paths are known as the prefix paths. The below figure shows the prefix paths for node p.

Step 2 — The next step would be to update the support count of the nodes to only represent those paths which contain node p. For example, {f: 4, c: 3, a: 3, m: 2, p: 2} contains many paths without node p like {f, c, a}, so we have to update the support counts. We do this by adding the support count of node p to all of its parent nodes till the root node. Once the paths are updated with new support counts, we will eliminate all those items whose support count is less than the minimum support count, in this case, 3. The support count of items will be calculated by adding all the support counts of nodes containing that item in the prefix paths. This will give us a conditional FP-Tree. Conditional FP-Tree for node p is as shown below.

Support for c is 3, which is equal to the minimum support threshold provided. As we can conclude from the above conditional FP-Tree, {c, p} becomes a frequent itemset. Following this procedure, and recursively generating conditional FP-Trees and prefix paths, we get all the following patterns:

{p - 3}, {c, p - 3}, {m - 3}, {f, m - 3}, {c, f, m - 3}, {c, m - 3}, {a, m - 3}, {f, a, m - 3}, {c, f, a, m - 3}, {c, a, m - 3}, {b - 3}, {a - 3}, { f, a - 3}, {c, f, a - 3}, {c, a - 3}, {f - 4}, {c, f - 3}, {c - 4}

The above curly braces consist of itemsets and support separated by a hyphen.

FP-growth is an interesting algorithm because it illustrates how a compact representation of the transaction data set helps to efficiently generate frequent itemsets. In addition, for certain transaction data sets, FP-growth outperforms the standard Apriori algorithm by several orders of magnitude. The run-time performance of FP-growth depends on the compaction factor of the dataset. If the resulting conditional FP-trees are very bushy (in the worst case, a full prefix tree), then the performance of the algorithm degrades significantly because it has to generate a large number of subproblems and merge the results returned by each subproblem.

From the next blog, we will be diving into how to extract association rules from the extracted frequent items. So, stay tuned!

Consuming AI in byte sized applications is the best way to transform digitally. #BuiltOnAI, EdgeVerve’s business application, provides you with everything you need to plug & play AI into your enterprise.  Learn more.

Topics:
aritficial intelligence ,data science ,ai ,fp-growth ,algorithm ,fp-tree ,dataset ,machinex

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}