An Analysis on Attribute Selection and Token Formation used for Duplicate Record Detection

Krishna Kant Tiwari; Dr. Qaim Mehdi Rizbi

doi:10.29070/pv7aec32

Authors

Krishna Kant Tiwari Research Scholar, Shri Krishna University, Chhatarpur, Madhya Pradesh, India Author
Dr. Qaim Mehdi Rizbi Associate Professor, Department of Computer Science & Application, Shri Krishna University, Chhatarpur, Madhya Pradesh, India Author

DOI:

https://doi.org/10.29070/pv7aec32

Keywords:

Data, Duplicate Data, Attribute Selection, Token Formation, Algorithm, Quality

Abstract

The data mining method relies heavily on data pre-processing. The data cleansing methods that work for some types of data may not work for others. Extensive experiments are conducted to analyze & assess a newly constructed method for attribute selection. The data cleaning processes involve reducing the amount of attributes to deal with noisy data & duplicate data. The experimental findings demonstrate that it is an extremely efficient and straightforward method for attribute selection by significantly reducing the attributes. Efficiently reducing the time required for subsequent data cleaning processes, such as token synthesis, record similarity, & deletion, is the primary goal of attribute selection for data cleaning. Smart tokens for data cleansing are formed using the token generation algorithm, which is appropriate for data that consists of numeric, alphabetic, & non-numerical elements. Duplicate data can be efficiently removed using token-based data cleaning. Attribute selection & token-based technique will both shorten the time required.

Downloads

Download data is not yet available.

References

Ali, A., Emran, N. A., Asmai, S. A., & Thabet, A. (2018). Duplicates detection within incomplete data sets using blocking and dynamic sorting key methods. International Journal of Advanced Computer Science and Applications, 9(9).

Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), Washington, DC, August 2003

Chen Shengxin, Intelligent Data Warehousing: From Data Preparation to Data Mining, Language: ENGLISH. 242p. 16x24 Hardback, Publication date: 01-2002.

Elgamal, F., Mosa, N. A., & Amasha, N. A. (2014). Application of framework for data cleaning to handle noisy data in cloud computing. International Journal of Soft Computing and Engineering, 3, 226-231.

F. Naumann and M. Herschel, “An introduction to duplicate detection,” Synthesis Lectures on Data Management, vol. 2, no. 1, pp. 1–87,2010.

Kaur, R., Chana, I., & Bhattacharya, J. (2018). Data deduplication techniques for efficient cloud storage management: a systematic review. The Journal of Supercomputing, 74, 2035-2085.

Leesakul, W., Townend, P., & Xu, J. (2014, April). Dynamic data deduplication in cloud storage. In 2014 IEEE 8th International Symposium on Service Oriented System Engineering (pp. 320-325). IEEE.

Patil, R. Y., & Kulkarni, R. V. (2012). A review of data cleaning algorithms for cloud computing systems. International Journal of Computer Science and Information Technologies, 3(5), 5212-5214.

Rajakumari, K. E. (2019, February). Comparison of Token-Based Code Clone Method with Pattern Mining Technique and Traditional String Matching Algorithms In-terms of Software Reuse. In 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT) (pp. 1-6). IEEE.

Reddy, S. L., & Prasad, K. R. (2019) Study on advantages of deduplication in cloud computing. Journal of Engineering Sciences. Vol 10,Issue3, MARCH/2019 ISSN NO:0377-9254

Selvi, S. A. E., & Anbuselvi, R. (2015, March). An Analysis of Data Replication Issues and Strategies on Cloud Storage System. In International Journal of Engineering Research & Technology (IJERT), NCICN-2015 Conference Proceedings, pp18-21.

Zafar, F., Khan, A., Malik, S. U. R., Ahmed, M., Anjum, A., Khan, M. I., ... & Jamil, F. (2017). A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends. Computers & Security, 65, 29-49.

An Analysis on Attribute Selection and Token Formation used for Duplicate Record Detection

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

menu

Collaboration

Latest publications

Language

Information