Main Article Content

Authors

Krishna Kant Tiwari

Dr. Qaim Mehdi Rizbi

Abstract

The data mining method relies heavily on data pre-processing. The data cleansing methods that work for some types of data may not work for others. Extensive experiments are conducted to analyze & assess a newly constructed method for attribute selection. The data cleaning processes involve reducing the amount of attributes to deal with noisy data & duplicate data. The experimental findings demonstrate that it is an extremely efficient and straightforward method for attribute selection by significantly reducing the attributes. Efficiently reducing the time required for subsequent data cleaning processes, such as token synthesis, record similarity, & deletion, is the primary goal of attribute selection for data cleaning. Smart tokens for data cleansing are formed using the token generation algorithm, which is appropriate for data that consists of numeric, alphabetic, & non-numerical elements. Duplicate data can be efficiently removed using token-based data cleaning. Attribute selection & token-based technique will both shorten the time required.

Downloads

Download data is not yet available.

Article Details

Section

Articles

References

  1. Ali, A., Emran, N. A., Asmai, S. A., & Thabet, A. (2018). Duplicates detection within incomplete data sets using blocking and dynamic sorting key methods. International Journal of Advanced Computer Science and Applications, 9(9).
  2. Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), Washington, DC, August 2003
  3. Chen Shengxin, Intelligent Data Warehousing: From Data Preparation to Data Mining, Language: ENGLISH. 242p. 16x24 Hardback, Publication date: 01-2002.
  4. Elgamal, F., Mosa, N. A., & Amasha, N. A. (2014). Application of framework for data cleaning to handle noisy data in cloud computing. International Journal of Soft Computing and Engineering, 3, 226-231.
  5. F. Naumann and M. Herschel, “An introduction to duplicate detection,” Synthesis Lectures on Data Management, vol. 2, no. 1, pp. 1–87,2010.
  6. Kaur, R., Chana, I., & Bhattacharya, J. (2018). Data deduplication techniques for efficient cloud storage management: a systematic review. The Journal of Supercomputing, 74, 2035-2085.
  7. Leesakul, W., Townend, P., & Xu, J. (2014, April). Dynamic data deduplication in cloud storage. In 2014 IEEE 8th International Symposium on Service Oriented System Engineering (pp. 320-325). IEEE.
  8. Patil, R. Y., & Kulkarni, R. V. (2012). A review of data cleaning algorithms for cloud computing systems. International Journal of Computer Science and Information Technologies, 3(5), 5212-5214.
  9. Rajakumari, K. E. (2019, February). Comparison of Token-Based Code Clone Method with Pattern Mining Technique and Traditional String Matching Algorithms In-terms of Software Reuse. In 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT) (pp. 1-6). IEEE.
  10. Reddy, S. L., & Prasad, K. R. (2019) Study on advantages of deduplication in cloud computing. Journal of Engineering Sciences. Vol 10,Issue3, MARCH/2019 ISSN NO:0377-9254
  11. Selvi, S. A. E., & Anbuselvi, R. (2015, March). An Analysis of Data Replication Issues and Strategies on Cloud Storage System. In International Journal of Engineering Research & Technology (IJERT), NCICN-2015 Conference Proceedings, pp18-21.
  12. Zafar, F., Khan, A., Malik, S. U. R., Ahmed, M., Anjum, A., Khan, M. I., ... & Jamil, F. (2017). A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends. Computers & Security, 65, 29-49.