MachineX: Layman's Guide to Association Rule Learning

DZone 's Guide to

MachineX: Layman's Guide to Association Rule Learning

Association rule learning is a data mining technique that allows for recommendation systems. It uncovers relationships between various items.

· AI Zone ·
Free Resource

Association rule learning is one of the most common techniques in data mining as well as machine learning. The most common use, which I'm sure you're aware of, is the recommendation systems used by various e-shops like Amazon, Flipkart, etc.

Association rule learning is a technique to uncover the relationship between various items, elements, or various variables in a very large database. Building an analogy with the above examples of e-shops, it is the relationship between different items on the website. Association rule learning tells us that if a user buys, say, a book, how likely is it that he will buy another book, where these two books are related because other users have bought them both. Let me be more clear by giving you an example. Suppose you want to learn Scala, so you decide to go to Amazon and buy Scala Cookbook. When you open up the webpage and scroll down a little, you see this:

All of the books in the above picture are recommendations for the user who is currently viewing Scala Cookbook. As you can clearly see, the "Frequently bought together" section consists of the package, or item set, that a lot of users have bought, and the "Customers who bought this item also bought" section consists of items that users have bought individually after or before buying the Scala Cookbook. This has been made possible using a very large database and association rule learning. So, as you have probably already figured, association rule learning is basically finding out rules that associate different variables in a database.

From Wikipedia:

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.

Now the definition of association rule learning is clear, but what is this "some measures of interestingness"? Let's talk about that now, using an example.

Suppose a dataset exists such as the one below:

Transactions here are the items bought by different users. Books are the books that have been bought by the user. Don't confuse transactions here with the normal meaning of transactions, wherein one transaction means a single instance of purchase by a user. Here, one transaction represents a set of books bought by different people at, maybe different points of time or maybe at the same time as well.

Now, looking at the data, we can easily deduce itemsets like -

{Learning Scala, Learning Spark}
{Programming in Scala, Hadoop: The Definitive guide}

...and many more. But these itemsets in themselves don't tell us much. For example, we can say that Learning Scala and Learning Spark are generally bought together, and similarly, Programming in Scala and Hadoop: The Definitive Guide are bought together. But in a very large database, we aren't interested in every itemset that can be mined from the data, but rather in those itemsets that are of some kind of interest, maybe from a business perspective or from some other perspective. Here comes the role of the measures of interestingness, which are discussed below.


Support tells us how frequent an item or an itemset is in all of the data. It basically tells us how popular an itemset is in the given dataset. For example, in the above-given dataset, if we look at Learning Spark, we can calculate its support by taking the number of transactions in which it has occurred and dividing it by the total number of transactions.

Support{Learning Spark} = 4/5
Support{Programming in Scala} = 2/5
Support{Learning Spark, Programming in Scala} = 1/5

Support tells us how important or interesting an itemset is based on its number of occurrences. This is an important measure because in real data, there are millions and billions of records, and working on every itemset is pointless because in millions of purchases, if a user buys Programming in Scala and a cooking book, it would be of no interest to us.

But support alone isn't enough. Although support is important, it alone doesn't tell us the rules that are needed to actually take advantage of this large data. Until now, we've only looked at the itemsets that are mined from the given dataset, but a rule is something different. It's not just a collection of books bought together by a user but also tells us how those books are related. For example, in the above dataset, there is an itemset:

{Learning Spark, Programming in Scala} 

Now, looking at it, we cannot tell that whether people who buy Learning Spark also buy Programming in Scala or whether it's the other way round. For this purpose, we will look at another measure, i.e. confidence.


A rule consists of two parts: antecedent and consequent. For example, in the Learning Spark -> Programming in Scala rule, Learning Spark is antecedent and Programming in Scala is consequent. Confidence tells us how likely a consequent is when the antecedent has occurred. Making it analogous to the above rule, how likely is it for someone to buy Programming in Scala when he has already bought Learning Spark.

Confidence is calculated using support values. For the rule Learning Spark -> Programming in Scala, the confidence will be calculated as follows:

...which calculates to 1/4, which is 25%, whereas, if we change the positions of antecedent and consequent in this rule, we will get 50%. This means that there is a 25% chance that if a user has bought Learning Spark, then he will also purchase Programming in Scala, but there is a 50% chance that if a user has bought Programming in Scala, then he will also buy Learning Spark. But there is still one problem with confidence. In this example, we got 25% for {Learning Spark -> Programming in Scala}, while we got 50% for the other way around. This happened because Programming in Scala isn't very popular in the dataset with a support of only 2/5 while Learning in Spark has a support of 4/5. So, the items aren't really related to each other. If an item is frequent in a dataset, then there's a high probability that a less frequent item's transaction will contain the more frequent item, thus inflating the confidence. We can overcome this by dividing the support of the itemset with the product of the support of all the items present in the itemset to avoid fluke rules. This is known as lift.


Lift tells us how likely the consequent is when the antecedent has already occurred, taking into account the support of both antecedent and consequent. For the above example, we can calculate lift as follows:

...which gives us 5/8. A lift of less than one means that if antecedent has occurred, then it is unlikely that consequent will also occur. A lift of one means that both the antecedent and consequent are independent of each other. And a lift of more than one means that if antecedent occurs then it is likely that consequent will also occur. So, a value of 5/8 indicates that this rule is a fluke. Let's consider the rule {Learning Scala -> Learning Spark}, we get its lift as 5/4, which is greater than one, meaning that this rule is certainly an interesting one.

Using these measures, various algorithms have been implemented to find the association rules from a database, like apriori and FP-growth. In my next article, I will be discussing the implementation of association rule learning. So stay tuned.

Thanks for reading!


ai ,machine learning ,deep learning ,association rules ,data science ,recommendations ,data mining ,tutorial

Published at DZone with permission of Akshansh Jain , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}