Human Involvement In Association Rule Mining
The Role of Human Heuristics in Association Rule Mining
by Luxmi Parmar*, Dr. Rajesh Pathak,
- Published in Journal of Advances and Scholarly Researches in Allied Education, E-ISSN: 2230-7540
Volume 2, Issue No. 1, Jul 2011, Pages 0 - 0 (0)
Published by: Ignited Minds Journals
ABSTRACT
Anumber of data mining algorithms have been introduced to the community thatperform summarisation and classification of data with respect to a targetattribute, deviation detection, and other forms of data characterisation andinterpretation. One popular summarisation and pattern extraction algorithm isthe association rule algorithm. Association rule is described as anassociational relationship between a group of objects in a transactionaldatabase [Zhang S. et.al., 2006]. The following discussion shall be useful toJustify the intervention of human heuristics in the data mining process.
KEYWORD
data mining algorithms, summarisation, classification, target attribute, deviation detection, data characterisation, interpretation, association rule algorithm, associational relationship, transactional database
INTRODUCTION
A number of data mining algorithms have been introduced to the community that perform summarisation and classification of data with respect to a target attribute, deviation detection, and other forms of data characterisation and interpretation. One popular summarisation and pattern extraction algorithm is the association rule algorithm. Association rule is described as an associational relationship between a group of objects in a transactional database [Zhang S. et.al. 2006]. The following discussion shall be useful to Justify the intervention of human heuristics in the data mining process.
OBJECTIVE
Study the features of the data mining tools: TANAGRA and MATLAB – SOM TOOLBOX and FUZZY LOGIC TOOLBOX; and find the ponits where interaction is needed. Develop and execute interactive data mining algorithms in “java” paltform and report the results obtained. Apply and analyse interactive data mining process to various medical databases arranged from different sources grasp the basic idea behind association rule mining algorithm.
MATERIAL AND METHOD
To perform the experimentation for the present research work, various data mining techniques as well as tools were well thought-out. Different data mining techniques such as association rule mining, clustering was implemented in programming language (Java Platform). The data mining algorithms with human interaction points were designed and tested on various databases. The data mining market consists of software vendors offering tools that extract predictive information from large data stores, which can then be analysed to enhance corporate data resources and generate predictions regarding business trends and behaviour. Specifically, these tools provide statistical data models (classification or clustering studies, linear regression, and current or predictive modeling) and utilise visualisation functions to support the analysis of massive quantities of data stored by business organisations. Data mining tools may be implemented on existing customer platforms or integrated with other applications as part of a larger data quality initiative or business intelligence (BI) strategy. Data mining tools provide both developers and business users with an interface for discovering, manipulating, and analysing corporate data. Although there are a number of data mining tools available in the market, but the following tools were used to perform the experimentation of this research work (which suits the problem
best),and tested on points of human interactivity
Let D be a transaction database and I = {i1, i2, ..im} be an item set. Transaction database D contains a sequence of transactions T = {t1, t2, .. tn} (where T I) with a sole identifier. An association rule X→Y may be discovered in the data where X and Y are conjunctions of items and X ∩ Y = . The intuitive meaning of such a rule is that transactions in the database which contains the items in X tend to also contain the items in Y. The user supplies minimum support and confidence thresholds. The support of the rule X→Y represents the percentage of transactions from the original database that contain both X and Y. The confidence of the rule X→Y represents the percentage of transactions containing items in X that also contains items in Y. Association rules are based upon the concept of strong rule. A rule that satisfies both minimum support and minimum confidence at the same time has been described as a strong rule in the literature [Agrawal R. et.al., 1993]. The process of discovering of association rules is broken up into two steps [Agrawal R. et.al.,
1994]:
(i) Find all itemsets (set of items appearing together in a transaction) whose support is greater than the specified threshold. Itemsets with minimum support are called frequent item sets. (ii) Generate association rule from the frequent item sets. To do this, consider all partitioning of the item sets into left-hand and right-hand sides. The confidence of a rule X→Y that satisfies minimum support is calculated as support (XY)/support (Y). All the rules that meet the confidence threshold are reported as discoveries of the algorithm. Association rules were first introduced in [Agrawal R. et.al., 1993]. The subsequent paper [Agrawal R. et.al., 1994] discusses Apriori algorithm that is considered as one of the most important contributions to the field of data mining. Although, other algorithms such as AIS [Agrawal R. et.al., 1993] and SETM [Houtsma M.A.W. et.al., 1993] are also available for mining association rules, yet Apriori remains the most widely used approach for generating frequent itemsets. The algorithm accomplishes the search of frequent itemsets in recursive order. It first scans the database D and calculates the support of each single item in every record I in D, and denotes it as C1. Out of the itemsets in C1, the algorithm computes the set L1 containing the frequent 1-itemsets. In the kth scan of the database, it generates all the new itemset candidates using the set Lk-1 of frequent (k-1) itemsets discovered in the previous scanning and denotes it as
Ck. And the itemsets whose support is greater than the minimum support threshold are kept in Lk