Article Details

A Comparison Study of Different Type of Data Analysis |

Ravinder Singh, in Journal of Advances in Science and Technology | Science & Technology


There is currently considerable enthusiasm around theMapReduce (MR) paradigm for large-scale data analysis. Although the basiccontrol flow of this framework has existed in parallel SQL database managementsystems (DBMS) for over 20 years, some have called MR a dramatically newcomputing model. In this paper, we describe and compare both paradigms.Furthermore, we evaluate both kinds of systems in terms of performance anddevelopment complexity. To this end, we define a benchmark consisting of acollection of tasks that we have run on an open source version of MR as well ason two parallel DBMSs. For each task, we measure each system’s performance forvarious degrees of parallelism on a cluster of 100 nodes. Our results revealsome interesting trade-offs. Although the process to load data into and tunethe execution of parallel DBMSs took much longer than the MR system, theobserved performance of these DBMSs was strikingly better. We speculate aboutthe causes of the dramatic performance difference and consider implementationconcepts that future systems should take from both kinds of architectures