Near Duplicate Document Detection Survey: An in-depth analysis of near duplicate document detection in search engines

Radha  Saini Radha  Saini; Dr. Omparkash . Dr. Omparkash .

Authors

Radha Saini Research Scholar Author
Dr. Omparkash . Associate Professor Author

Keywords:

near duplicate document detection, search engines, retrieving information, duplicated documents, data filtering algorithm, performance improvement, similar pairs identification, literature review, web

Abstract

Search engines are the major breakthrough on the web for retrieving the information. But List of retrieved documents contains a high percentage of duplicated and near document result. So there is the need to improve the performance of search results. Some of current search engine use data filtering algorithm which can eliminate duplicate and near duplicate documents to save the users’ time and effort. The identification of similar or near-duplicate pairs in a large collection is a significant problem with wide-spread applications. In this paper survey present an up-to-date review of the existing literature in duplicate and near duplicate detection in Web.

Downloads

Download data is not yet available.

Near Duplicate Document Detection Survey

An in-depth analysis of near duplicate document detection in search engines

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

menu

Collaboration

Latest publications

Language

Information