An Analysis Around the Study of Distributed Data Mining Method In the Grid Environment: Technique, Algorithms and Services |
Distribution of data and computation allows for solvinglarger problems and execute applications that are distributed in nature. TheGrid is a distributed computing infrastructure that enables coordinatedresource sharing within dynamic organizations consisting of individuals,institutions, and resources. The Grid extends the distributed and parallelcomputing paradigms allowing resource negotiation and dynamical allocation,heterogeneity, open protocols and services. Grid environments can be used bothfor compute intensive tasks and data intensive applications as they offerresources, services, and data access mechanisms. Data mining algorithms and knowledge discovery processesare both compute and data intensive, therefore the Grid can offers a computingand data management infrastructure for supporting decentralized and paralleldata analysis. This paper discusses how Grid computing can be used to supportdistributed data mining. Grid-based data mining uses Grids as decentralizedhigh-performance platforms where to execute data mining tasks and knowledgediscovery algorithms and applications. Here we outline some research activitiesin Grid-based data mining, some challenges in this area and sketch somepromising future directions for developing Gridbased distributed data mining. Data mining algorithms are widely used today for theanalysis of large corporate and scientific datasets stored in databases anddata archives. Industry, science, and commerce fields often need to analyzevery large datasets maintained over geographically distributed sites by usingthe computational power of distributed and parallel systems. The grid can playa significant role in providing an effective computational support fordistributed knowledge discovery applications. For the development of datamining applications on grids we designed a system called KNOWLEDGE GRID. Thispaper describes the KNOWLEDGE GRID framework and presents the toolset providedby the KNOWLEDGE GRID for implementing distributed knowledge discovery. In many industrial, scientific and commercialapplications, it is often necessary to analyze large data sets, maintained overgeographically distributed sites, by using the computational power ofdistributed and parallel systems. The grid can play a significant role inproviding an effective computational support for knowledge discoveryapplications. We describe software architecture for geographically distributedhigh-performance knowledge discovery applications called Knowledge Grid, whichis designed on top of computational grid mechanisms, provided by gridenvironments such as Globus. The Knowledge Grid uses the basic grid servicessuch as communication, authentication, information, and resource management tobuild more specific parallel and distributed knowledge discovery tools andservices. Grid computing has emerged as an important new branch ofdistributed computing focused on large-scale resource sharing andhigh-performance orientation. In many applications, it is necessary to performthe analysis of very large data sets. The data are often large, geographicallydistributed and it’s complexity is increasing. In these area grid technologiesprovides effective computational support for applications such as knowledgediscovery. This paper is an introduction to Grid infrastructure and itspotential for machine learning tasks.