A Novel Technique of Data Duplication Detection on Web and Improving Page Rank |
The problem of findingrelevant documents has become much more difficult due to the return a largenumber of Web pages generally in the form of ranked list data on the WWW. Thisresult increases the users’ searching time to find the desired informationwithin the search results, while in general most users just want to resultpages to find new/different results. Thus a work is done which reduced a searchspace and high priority pages are to move upwards in the result list. The Webmining tools are used to classify, cluster and order the documents so thatusers can easily navigate through the search result and find the desiredInformation content .The method first performs query clustering in query logsand then capture the weight of clicked web pages in each cluster and alsocapture the rank of that pages then we find out the new rank by adding theweight and existing rank, Now apply the Insertion Sort and set the levelaccording the high priority. In this paper,architecture is being proposed that introduces methods that order the resultsaccording to both the relevancy and the importance of documents. This proposedwork results in reduced search space as user intended pages tend to move upwords in result list.