Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Market Basket Analysis Using Oracle Data Mining

DZone's Guide to

Market Basket Analysis Using Oracle Data Mining

Learn how to use Oracle data mining to do market basket analysis ā€” a theory that if you buy a certain group of items, you're likely to buy another group of items.

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

In this article, I will do market basket analysis with Oracle data mining. 

Data science and machine learning are very popular today. But these subjects require extensive knowledge and application expertise. We can solve these problems with various products and software that have been developed by various companies. In Oracle, methods and algorithms for solving these problems are presented to users with the DBMS_DATA_MINING package.

With the DBMS_DATA_MINING package, we can create models such as clustering, classification, regression, anomaly detection, feature extraction, and association. We can interpret efficiency with the models we create. The results we obtain from these models can be put into our business scenario.

The DBMS_DATA_MINING package does not come up by default on the Oracle database. For this reason, it's necessary to install this package first. You can set up your database with Oracle data mining by following this link.

With the installation of the Oracle data mining package, three new dictionary tables are created:

SELECT * FROM  ALL_MINING_MODELS;
SELECT * FROM ALL_MINING_MODEL_SETTINGS;
SELECT * FROM ALL_MINING_MODEL_ATTRIBUTES;

The ALL_MINING_MODELS table contains information about all the models.

The ALL_MINING_MODELS_SETTINGS and ALL_MINING_MODELS_ATTRIBUTES tables contain parameters and specific details about these models.

Now, let's prepare an easily understood data set to do market basket analysis.

We need a dataset to do the example. We will do this example through the ONLINE_RETAIL dataset we downloaded from UCI.

Let's first examine the ONLINE_RETAIL dataset.

The ONLINE_RETAIL dataset consists of eight (three numeric, five string) columns and contains 541,909 records. The description of each column in the ONLINE_RETAIL  dataset is given below.

Column Name Description Data type
InvoiceNo Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter ā€˜cā€™, it indicates a cancellation. String
StockCode Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product. String
Description Product (item) name. Nominal. String
Quantity The quantities of each product (item) per transaction. Numeric. Numeric
InvoiceDate Invice Date and time. Numeric, the day and time when each transaction was generated. Date
UnitPrice Unit price. Numeric, Product price per unit in sterling. Numeric
CustomerID Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer. Numeric
Country Country name. Nominal, the name of the country where each customer resides.

String

Now that we have reviewed the details with our dataset, let's load the ONLINE_RETAIL that we downloaded to our Oracle database.

First, create the Oracle table in which we will load the data set (XLS) that we downloaded.

CREATE TABLE ONLINE_RETAIL
(
  INVOICENO    VARCHAR2(100 BYTE),
  STOCKCODE    VARCHAR2(100 BYTE),
  DESCRIPTION  VARCHAR2(200 BYTE),
  QUANTITY     NUMBER,
  INVOICEDATE  DATE,
  UNITPRICE    NUMBER,
  CUSTOMERID   NUMBER,
  COUNTRY      VARCHAR2(100 BYTE)
);

Now that we have created our table, we will load the dataset we downloaded as CSV into the table; we have multiple methods to do this:

  • Using Oracle External Table

  • Using Oracle SQL Loader

  • Using SQL-PL/SQL editors (Oracle SQL Developer, Toad, PL/SQL Developer...)

I will load the dataset with the editor that I am using. I use Toad as an editor. With Toad, you can perform the data loading process by following the steps below.

Database > Import > Import Table Data

You may not be using this editor because Toad is paid. This feature is available to other editors, so you can easily do this with other editors. For example, with Oracle SQL Developer for free, you can load data as follows using Oracle SQL Developer.

SELECT * FROM ONLINE_RETAIL;

We have completed the dataset installation process. 

When we observe the data, we see the details of the purchases made by the customers. Each row contains information on which products are taken, how much they are received, who bought this product, what date it bought, how much it costs, and from which country. When we examine the data in detail, we also observe that there are one or more records belonging to the same InvoiceNo. So we can think of InvoiceNo as a basket.

Now let's observe a sample basket.

SELECT * FROM ONLINE_RETAIL WHERE INVOICENO = '536368';

Yes, we saw an example basket.

Now, let's analyze these together to determine the best-selling products together. I will give you some information about the algorithm used for better understanding of the subject before we start the association analysis.

DBMS_DATA_MINING package performs association analysis with APRIORI algorithm. To use this algorithm, we need to define some parameters. The default values for these parameters and parameters are as follows:

Source

The article on the link can be reviewed to better understand the algorithm parameters and what they mean. Now, we will read the model settings, create a table, and insert the algorithm parameters into it.

CREATE TABLE SETTINGS_ASSOCIATION_RULES
AS
   SELECT *
     FROM TABLE (DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
    WHERE SETTING_NAME LIKE 'ASSO_%';


BEGIN
   UPDATE SETTINGS_ASSOCIATION_RULES
      SET SETTING_VALUE = 3
    WHERE SETTING_NAME = DBMS_DATA_MINING.ASSO_MAX_RULE_LENGTH;

   UPDATE SETTINGS_ASSOCIATION_RULES
      SET SETTING_VALUE = 0.03
    WHERE SETTING_NAME = DBMS_DATA_MINING.ASSO_MIN_SUPPORT;

       UPDATE SETTINGS_ASSOCIATION_RULES
      SET SETTING_VALUE = 0.03
    WHERE SETTING_NAME = dbms_data_mining.asso_min_confidence;

   INSERT INTO SETTINGS_ASSOCIATION_RULES
        VALUES (DBMS_DATA_MINING.ODMS_ITEM_ID_COLUMN_NAME, 'STOCKCODE');

   COMMIT;
END;

Yes, we have created the algorithm parameter table. Now we can move on to the step of creating our model.

CREATE VIEW VW_ONLINE_RETAIL AS SELECT INVOICENO,STOCKCODE FROM ONLINE_RETAIL;

BEGIN 
   DBMS_DATA_MINING.CREATE_MODEL(
      model_name            => 'MD_ASSOC_ANLYSIS',
      mining_function       =>  DBMS_DATA_MINING.ASSOCIATION,
      data_table_name       => 'VW_ONLINE_RETAIL',
      case_id_column_name   => 'INVOICENO',
      target_column_name    =>  NULL,
      settings_table_name   => 'SETTINGS_ASSOCIATION_RULES');
END;

Now that we have our model, let's look at the details about the model now from the dictionary.

SELECT MODEL_NAME,
       ALGORITHM,
       COMMENTS,
       CREATION_DATE,
       MINING_FUNCTION,
       MODEL_SIZE
  FROM ALL_MINING_MODELS
 WHERE MODEL_NAME = 'MD_ASSOC_ANLYSIS';

SELECT SETTING_NAME, SETTING_VALUE
FROM ALL_MINING_MODEL_SETTINGS
WHERE MODEL_NAME = 'MD_ASSOC_ANLYSIS';

Let's look at the output of the analysis.

SELECT RULE_ID,
       B.ATTRIBUTE_SUBNAME ANTECEDENT_STOCKCODE,
       C.ATTRIBUTE_SUBNAME CONSEQUENT_STOCKCODE,
       RULE_SUPPORT,
       RULE_CONFIDENCE
  FROM TABLE (DBMS_DATA_MINING.GET_ASSOCIATION_RULES ('MD_ASSOC_ANLYSIS')) A,
       TABLE (A.ANTECEDENT) B,
       TABLE (A.CONSEQUENT) C;

As a result of the parameters we have given to the algorithm, we have reached and displayed products with high sales rate together. It is possible to make different analyzations by changing the parameters of the algorithm.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:
machne learning ,oracle ,data mining ,tutorial ,algorithms ,data analytics ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}