An Analysis on Various Measurements For Association Pattern Identification In a Multiple Database |
Data mining isan area of data analysis that has arisen in response to new data analysischallenges, such as those posed by massive data sets or non-traditional typesof data. Association analysis, which seeks to find patterns that describe therelationships of attributes (variables) in a binary data set, is an area ofdata mining that has created a unique set of data analysis tools and conceptsthat have been widely employed in business and science. The objective measuresused to evaluate the interestingness of association patterns are a key aspectof association analysis. Indeed, different objective measures define differentassociation patterns with different properties and applications. This paperfirst provides a general discussion of objective measures for assessing theinterestingness of association patterns. It then focuses on one of thesemeasures, h-confidence, which is appropriate for binary data sets with skeweddistributions. The usefulness of h-confidence and the association pattern thatit defines—a hyper clique—is illustrated by an application that involvesfinding functional modules from protein complex data.