Adoption of Association Rule Mining using Apriori

About Use case

Card Transactions from PTLF (POS Transaction Log File) can be utilized to identify the relation between the categories. I’ve considered few MCC’s (merchant category code) for all the transaction sets available in PTLF specific to single BIN issued to the list of customers from Card Issuer.

Transactions logged in accordance with the MCC has been considered based on the below table;

MCC Code 4899 4900 5816 5411
402411XXXXXXXXX1 1 1 1 0
402411XXXXXXXXX2 1 0 1 1
402411XXXXXXXXX3 1 1 1 0
402411XXXXXXXXX4 1 1 1 0
402411XXXXXXXXX5 0 0 1 1
402411XXXXXXXXX6 1 1 1 0
402411XXXXXXXXX7 1 1 1 0
402411XXXXXXXXX8 0 0 1 0
402411XXXXXXXXX9 1 1 1 0
402411XXXXXXXX10 0 1 0 1
402411XXXXXXXXX11 1 0 1 0
402411XXXXXXXXX12 1 1 0 0

1 – Transaction YES      0 – Transaction NO

We are going to see, whether the customers performed the Utility Payments has used their cards against Digital Goods or not; Like that, we are going to verify few combinations in this post. Relation between the two or more categories will be identified in which chances of frequent transaction was happened or not in between the merchant categories.

What is the use of this Use Case?

Card Issuer can decide to increase the transaction volume specific to forecasting merchant category through cash-back offer and other types of promotional offer. If the transaction happened for Utility Payment is high and the same customers did the frequent transaction for Television Services means, card issuer can give cash back offer for the usage of television services category in the forthcoming months.

What is Apriori Algorithm?

Apriori Algorithm is used to implement the Association Rule Mining technique in which it is used to identify the relations between the items. Basically, it is utilized for market basket analysis but we are going to see the different use case in this post based on the below details.

How Apriori Work?

Apriori comprised of three main components as follows;

  • Support
  • Confidence
  • Lift

We can utilize all the above components with the sample data highlighted in the above Use Case section;


Support is used to identify the default popular ratio in which number of utility payment transactions containing in the total number of transactions triggered from the specific BIN of customers.

Support (Utility Payments) = TXNS containing Utility Payment / Total No. of TXNS

(1) Support (Utility Payments) = 8 / 30 = 27%

Utility Payments occupied 28% in the quarterly total number of transactions.

(2) Support (Grocery) = 3 / 30 = 10%

Grocery Payments occupied 9% in the quarterly total number of transactions.


Confidence is used to bring the relations between the card purchase specific to categories. Example card utilized for Digital Goods used together for Utility payment in the same period of time. Card swiped or utilized with high chances between the two MCC’s;

(1) Confidence (Digital –> Utility) = TXNS containing both Utility & Digital / TXNS containing Digital

Confidence (Digital –> Utility) = 6 / 10 = 60%

(2) Confidence (Television –> Grocery) = TXNS containing both Television & Grocery / TXNS containing Television

Confidence (Television –> Grocery) = (1 / 9) = 11%


Lift is used to identify the likelihood of transaction happened together in the same period of time. When Digital goods purchased using the card for the specific period of time and in the same specific period, Utility payments performed using the same card;

(1) Lift (Digital –> Utility) = Confidence (Digital –> Utility) / Support (Utility Payments)

Lift (Digital –> Utility) = 60 / 27 = 2.2

(2) Lift (Television –> Grocery) = Confidence (Television –> Grocery) / Support (Grocery Payments)

Lift (Television –> Grocery) = 11 / 10 = 1 which means there is no association between Television and Grocery specific purchase from the customers.

We could evaluate greater number of combinations as like Television services and Grocery, Digital Goods and Utility Payments; even Digital Goods + Utility Payments + Television Services or Grocery + Television Services + Utility Payments etc.

Using the above mathematical approach, we could calculate the lift/higher relationship between the categories. According to the Apriori thump-rule, if the lift value is equal to 1 or less than 1 then there are no enough relations between the categories and if the lift value is greater than 1 then there are high chances of relationship between those categories.


Please refer the below code snippet available in GitHub for your learning / evaluation;



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s