Analysis of Token Formation towards Blocking and Similarity Computation

-

Authors

  • Parvesh Kumari
  • Dr. Kalpana .

Keywords:

blocking key, similarity computation, duplicate identification, edge value, run the show approach, low quality copies, cleaned records, false positives, token concept, data cleaning process, complexity

Abstract

The best blocking key will be chosen for the blocking records by looking at execution of the duplicate identification. In the subsequent stage the edge esteem is computed in view of the similitudes amongst records and fields. At that point, a run the show based approach is utilized to distinguish or identify copies and to kill low quality copies by holding just a single duplicate of the best duplicate record. At last, all the cleaned records are assembled or blended and made accessible for the following procedure. This research work will be effective for diminishing the quantity of false positives without passing up a major opportunity for recognizing copies. To contrast this new system and past methodologies the token idea is incorporated to accelerate the information cleaning process and lessen the unpredictability. Investigation of a few blocking key is made to choose best blocking key to unite comparative records through broad analyses to abstain from looking at all sets of records. A lead based approach is utilized to recognize correct and estimated copies and to kill copies.

Downloads

Published

2017-04-01

How to Cite

[1]
“Analysis of Token Formation towards Blocking and Similarity Computation: -”, JASRAE, vol. 13, no. 1, pp. 511–514, Apr. 2017, Accessed: Jul. 23, 2025. [Online]. Available: https://ignited.in/index.php/jasrae/article/view/6589

How to Cite

[1]
“Analysis of Token Formation towards Blocking and Similarity Computation: -”, JASRAE, vol. 13, no. 1, pp. 511–514, Apr. 2017, Accessed: Jul. 23, 2025. [Online]. Available: https://ignited.in/index.php/jasrae/article/view/6589