The Study of Web Mining As a Tool for Research Support System Concept and Issues
Advancing online research with a comprehensive web mining toolset
by Rifat Jahan*, Nazia Ahmad,
- Published in Journal of Advances and Scholarly Researches in Allied Education, E-ISSN: 2230-7540
Volume 16, Issue No. 4, Mar 2019, Pages 143 - 151 (9)
Published by: Ignited Minds Journals
ABSTRACT
The point of this paper is to give the past and current strategies in Web Mining. This paper additionally reports the synopsis of different systems of web mining drew closer from the accompanying edges like Feature Extraction, Transformation and Representation and Data Mining Techniques in different application domains. The study on data mining method is made regarding Clustering, Classification, Sequence Pattern Mining, Association Rule Mining and Visualization. From its absolute starting point, the capability of extricating important learning from the Web has been very obvious. Web mining, for example the use of data mining systems to remove information from Web substance, structure, and utilization, is the gathering of advances to satisfy this potential. Enthusiasm for Web mining has developed quickly in its short history, both in the research and professional networks. work offers a coordinated arrangement of web mining apparatuses that will help advance the cutting edge in supporting researchers doing on the web research. The proposed research work will give a universally useful device set which researchers can utilize to use web assets in their research.
KEYWORD
Web Mining, Research Support System, Feature Extraction, Transformation and Representation, Data Mining Techniques, Clustering, Classification, Sequence Pattern Mining, Association Rule Mining, Visualization
INTRODUCTION
The advancement of the World Wide Web has brought us tremendous and consistently developing measures of data and data. It inuences practically all parts of individuals' lives. In expansion, with the rich data given by the web, it has turned into a vital asset for research. Moreover, the minimal effort of web data makes it progressively alluring to researchers. Researchers can recover web data by perusing and watchword looking. Notwithstanding, there are a few restrictions to these systems. It is hard for researchers to recover data by perusing in light of the fact that there are many after connections contained in a web page. Watchword seeking will restore a lot of unessential data. On the other hand, customary data extraction and mining systems cannot be connected specifically to the web because of its semi-organized or even unstructured nature. Web pages are Hypertext records, which contain both content and hyperlinks to different archives. Moreover, other data sources likewise exist, for example, mailing records, newsgroups, gatherings, and so on. Along these lines, structure and usage of a web mining research emotionally supportive network has turn into a test for individuals with enthusiasm for using data from the web for their research. A web mining research emotionally supportive network ought to probably distinguish web sources as indicated by research needs, including recognizing accessibility, pertinence and significance of web locales; it ought to probably choose data to be removed, in light of the fact that a web website contains both pertinent and superfluous data; it ought to almost certainly dissect the data patterns of the gathered data and help to manufacture models and give legitimacy. Web is an accumulation of billion of records. The web is colossal, different, adaptable, and dynamic. The World Wide Web keeps on becoming both in the enormous volume of traffic and the size and intricacy of Web locales. It is hard to distinguish the important data present in the web. The greater part of the substance in the web are unstructured in nature, however next to no work manages unstructured and heterogeneous data on the Web. The developing field of web mining goes for finding and removing significant data that is covered up in Web-related data, specifically in content archives distributed on the Web. Data Mining includes the idea of extraction significant and important data from extensive volume of data. Web mining is a critical territory in data mining where we extricate the intriguing patterns from the substance. Web content mining manages the crude data that is accessible on the web. The web structure mining predominantly manages the structure of the web sites. Web Usage mining includes mining the use attributes of the clients of Web applications. It is in a semi-organized configuration with the goal that it needs heaps of pre-handling and parsing before Domain Understanding, Data choice, Data pre-handling and cleaning, Pattern revelation, Interpretation and Reporting.
Web mining techniques Data mining techniques
In the data mining networks, there are three sorts of mining: data mining, web mining, and content mining. There are many testing issues in data/web/content mining research. Data mining for the most part manages organized data sorted out in a database (DB) while content mining for the most part handles unstructured data/text. Web mining lies in the middle of and adapts to semi-organized data or potentially unstructured data. Web mining calls for innovative utilization of data mining or potentially message mining strategies furthermore, its particular methodologies. Mining the web data is a standout amongst the most difficult assignments for the data mining and data the board researchers on the grounds that there are immense heterogeneous, less organized data accessible on the web and we can undoubtedly get overpowered with data. In the literature, the terms of web mining, web data mining, and web data extraction mining are utilized reciprocally. In this paper, we utilize the term web mining. As per Wikipedia, web mining is the utilization of data mining procedures to find patterns from the web and can be ordered into three unique sorts of web utilization mining, web content mining, and web structure mining. The scientific categorization of web mining has developed from that of just web content mining and web utilization mining, for example, considered by Cooley et al. to incorporate that of web structure mining as explained by Liang. data in the web; web use mining is the application that utilizes data mining to dissect and find intriguing patterns of client's utilization of data on the web; and web structure mining is the way toward utilizing diagram hypothesis to break down the hub and association structure of a web site. A case of the last would find the experts and center points of any web document, for example recognizing the most suitable web joins for a web page. As indicated by Kosala and Blockeel, "practically speaking, the three web mining undertakings above could be utilized in disengagement or consolidated in an application, particularly in web content and structure mining since the web document may likewise contain joins." For instance, Zhong ponders the cerebrum informatics (for example blend of content and structure) from a web insight viewpoint.
LITERATURE REVIEW
Web content mining is performed by removing valuable information from the content of a web page/webpage. It incorporates extraction of organized data/information from web pages, distinguishing proof, match, and reconciliation of semantically comparative data, conclusion extraction from online sources, and idea progression, metaphysics, or knowledge reconciliation To diminish the hole between low-level image highlights used to file images and abnormal state semantic contents of images in content-based image retrieval (CBIR) frameworks or web indexes, Zhang et al. recommend applying importance input to refine the inquiry or comparability measures in image seek process. They present a structure of significance input and semantic realizing where low-level highlights and catchphrase comments are coordinated in image retrieval and in criticism procedures to improve the retrieval execution. They built up a model framework performing superior to customary methodologies. Liu very just recognizes the three kinds of web mining by noticing that web utilization mining finds client get to patterns from use logs; web structure mining finds knowledge from hyperlinks; and web content mining mines knowledge from page contents. Liu exhibited a webcast exclusively on web content mining in which he centers around organized data extraction, information incorporation, and IE from unstructured content, for example, "supposition mining" of client composed remarks. Darmont et al. propose a modeling procedure for warehousing heterogeneous web data and structure a java model that change diverse data into Xtensible Markup Language (XML) document.
Srivastava et al. characterize web utilization mining as the use of data mining systems to find utilization patterns from web data to more readily serve the requirements of web applications, and it incorporates three stages: preprocessing, pattern disclosure, what's more, pattern examination. The use data can be gathered at the distinctive sources such as web server logs, customer side data, web intermediary reserving. Data mining examinations such as affiliation rules, order, clustering, consecutive patterns, and reliance modeling can be utilized for personalization, framework improvement, site alteration, business insight, and use portrayal. A prototypical web use mining arrangement of Web Site Information Filter System (WebSIFT) is presented. Joshi claims that web mining can be said to have three data mining activities: clustering, affiliations, and successive investigation. Joshi expressed that clustering for web mining would discover natural gatherings of clients, pages or other; affiliations would be investigation of which URLs tend to asked for together; and consecutive examination would be the request in which URLs will in general be gotten to. Such investigation can be utilized for personalization. Web Mining can be extensively partitioned into three unmistakable classifications, as per the sorts of data to be mined. We give a short review of the three classifications. A figure portraying the scientific classification 1. Web Content Mining: Web Content Mining is the way toward separating helpful information from the contents of Web documents. Content data compares to the gathering of certainties a Web page was intended to pass on to the clients. It may comprise of content, images, sound, video, or organized records, for example, records and tables. Utilization of content mining to Web content has been the most widely researched. Issues tended to in content mining are, theme revelation, separating affiliation patterns, clustering of web documents and characterization ofWeb Pages. Research exercises on this theme have drawn intensely on procedures created in different trains, for example, Information Retrieval (IR) and Natural Language Processing (NLP). While there exists a huge assortment of work in separating knowledge from images in the fields of image processing and PC vision, the utilization of these methods to Web content mining has been restricted. associating related pages. Web Structure Mining is the way toward finding structure information from the Web. This can be additionally separated into two sorts based on the sort of structure information utilized. ■ Hyperlinks: A Hyperlink is a basic unit that associates an area in a Web page to various area, either inside a similar Web page or on a different Web page. A hyperlink that associates with an alternate piece of the equivalent page is called an Intra-Document Hyperlink, and a hyperlink that interfaces two distinct pages is called an Inter-Document Hyperlink. There has been acritical group of work on hyperlink examination, of which Desikan et. al. give a cutting-edge study. ■ Document Structure: what's more, the content inside a Web page can likewise be sorted out in a tree-organized configuration, based on the different HTML and XML labels inside the page. Mining endeavors here have concentrated on naturally extricating document object model (DOM) structures out of documents. 3. Web Usage Mining: Web Usage Mining is the use of data mining systems to find intriguing utilization patterns from Web data, so as to get it what's more, better serve the necessities of Web-based applications. Utilization data catches the character or beginning of Web clients alongside their perusing conduct at a Web website. Web utilization mining itself can be characterized further contingent upon the sort of utilization data considered: ■ Web Server Data: The client logs are gathered by Web server. Ordinary data incorporates IP address, page reference and access time. ■ Application Server Data: Commercial application servers such as Web logic Story Server have huge highlights to empower E-trade applications to be based over them with little exertion. A key component is the capacity to follow different sorts of business occasions and log them in application server logs. ■ Application Level Data: New sorts of occasions can be characterized in an application, what's more, logging can be turned on for them - producing narratives
Web mining taxonomy
ASSOCIATION RULE MINING LITERATURE SURVEY
Web Mining Problems and Approaches
Web mining is a procedure in data mining that consequently recovers extricates and breaks down the information from web. Yang and Wu et al, (2006) talk about the different issues to be tended to in data mining. The serious issues incorporate Automated Data Cleaning, Over Fitting, Under Fitting and Oversampling of data, Scaling up for high dimensional data, Mining sequence and time arrangement data. A survey was directed and given by k d chunks and a significant number of the researchers recommended the essential work for research as Scaling up Data Mining calculations for gigantic data, mining content and computerized data purifying as the serious issues talked about with most astounding priorities. Different issues incorporate managing uneven data, mining data streams, connection and systems. Security in mining and conveyed data mining additionally got the hugeness yet not to as more prominent degree. A fervently discussed specialized issue is whether it is smarter to set up a social database structure or a
1. Data Pre-processing Techniques
Web log pre-processing is the initial step that is vital to improve the effectiveness and nature of the web data in light of the fact that practically 70% of the time is taken in pre-processing and these pre-prepared data are given as a contribution to the following stages pattern disclosure and pattern examination. There are numerous methods accessible for pre-processing since quite a while. Web log document assumes a noteworthy job in pre-processing as the contents the client peruse are recorded in these log file. The data can be put away either at separate side, customer side, on intermediary servers and on operational database. Web Server Logs keeps up a past filled with page demands. Information about the demand, customer IP address, ask for date/time, page asked for, HTTP code, bytes served, client operator, are put away. Intermediary Server Logs a reserving system which lies between customer programs and Web servers. It diminishes the heap time of Web pages just as the system traffic load at the server and customer side and Browser Logs that can be altered or different JavaScript and Java applets can be utilized to gather customer side data. Customer side accumulation scores over server-side gathering since it lessens both the client and session recognizable proof issues. The favorable circumstances and drawbacks of log documents and their conduct are appeared in the table given beneath. To improve effectiveness and nature of patterns mined and to maintain a strategic distance from these uproarious and messy data different pre-processing strategies are accessible like Data cleaning, Data coordination, Data changes, and data reduction. Data cleaning-It is expected to expel commotion and right irregularities in the data. The mains issues of data cleaning are missing qualities, commotion, irregularities and copy elimination. The systems utilized in missing qualities are characterization, relapse, impedance based apparatuses utilizing Bayesian detailing, Decision Tree Induction. Binning, Smoothing, Regression, Clustering is helpful to expel the loud data from the database. The Duplicate disposal utilizes arranged neighborhood technique created to lessen the quantity of required correlations. Various business devices, e.g., IDCENTRIC (First Logic), PUREINTEGRATE (Oracle), QUICKADDRESS (QAS Systems), REUNION (Pitney Bowes), and (Trillium Software), center around cleaning this sort of data. Copy disposal Sample instruments for copy distinguishing proof and end incorporate DATACLEANSER (EDD), MERGE/PURGELIBRARY(Sagent/QM Software), MATCHIT (Help IT Systems), and MASTERMERGE (Pitney Bowes).
utilize this method. There are various issues to consider amid data coordination. Composition reconciliation can be dubious. This is alluded to as the element distinguishing proof issue. Repetition is another critical issue. A third essential issue in data mix is the location and goals of data esteem clashes. Data change Data change includes the methods like Smoothing, Aggregation and Normalization. 2. Data Pre-processing Challenges: • Data cleaning is by all accounts troublesome for semi organized data and unstructured data yet the vast majority of the data is by all accounts organized. So more work must be done in cleaning semi-organized data. • Data change is a vital stage that is done in pre-processing of data. Be that as it may, no careful devices are accessible. • Research work ought to be done on executing the best device for data transformation[1]. • Limited interoperability. • Though Duplicate disposal utilizes numerous strategies and apparatus despite everything it remains a repetitive assignment to be performed. • Query processing is troublesome on heterogeneous data.
3. Study on Pattern Extraction Techniques
Data mining procedures has two methodologies that incorporate spellbinding mining and prescient mining. Spellbinding mining focuses on the general properties of data in the database and prescient mining focuses on data to make predictions. The data mining Techniques are Undertakings for performing preprocessing of Web Usage Mining include data cleaning, client distinguishing proof, session ID, way culmination, session recreation, exchange ID and formatting. Be that as it may, when all is said in done, these apparatuses give almost no examination of data connections among the got to records and registries inside the Web space. Presently increasingly refined strategies for revelation and investigation of patterns are developing. These devices fall into two fundamental classes: Pattern Discovery Tools and Pattern Analysis Tools. Pattern disclosure draws upon strategies and calculations created from a few fields, for example, insights, data mining, machine learning and pattern acknowledgment. They are measurable investigation, affiliation, rule mining, clustering, Sequential pattern mining. Affiliation rule mining: Association Rules discover all arrangements of things that have bolster more noteworthy than the base help and afterward utilizing the expansive thing sets to create the ideal rules that have certainty more prominent than the base certainty. A calculation for discovering ass rule named as AIS was proposed by R.S.Agarwal et al. in 1993. The hindrance of the AIS calculation is that it results in pointless age and numerous competitor thing sets . The Apriori calculation exploits the way that any subset of a successive thing set is additionally a continuous thing set. The inconveniences are that various sweeps must be done on the database and it has complex time and memory expending. The upside of AprioriTid calculation is that the quantity of sections might be littler than the quantity of exchanges in the database, particularly in the later passes yet the expense of exchanging ought to be considered. AprioriHybrid Algorithm Apriori shows improvement over AprioriTid and AprioriTid shows improvement over Apriori in the later passes. FP – Tree calculation examines the database just twice yet it is by all accounts troublesome in gradual and intelligent rule mining. Custom constructed Apriori calculation that is productive and does powerful pattern examination. Another algo Bin Li Wang et al., 2010, proposed another technique to Improvement of Apriori Algorithm Based on Boolean Matrix. It checks exchange database just a single time, subsequently diminishes the framework cost and builds proficiency of data mining.
PROMINENT APPLICATIONS
A result of the fervor about the Web in the previous couple of years has been that Web applications have been created at an a lot quicker rate in the business than research in Web related innovations. A considerable lot of these are based on the utilization of Web mining ideas, despite the fact that the associations that built up these applications, and designed the relating advancements, did not think about it accordingly. We portray a portion of the best applications in this segment. Plainly, understanding that these applications use Web mining is to a great extent a review work out. For every application class talked about underneath, we have chosen an unmistakable delegate, only for praiseworthy purposes. This not the slightest bit suggests that every one of the procedures depicted were created by that association alone. In actuality, by and large the effective procedures were created by a quick 'duplicate and improve' way to deal with one another's thoughts. Customized Customer Experience in B2C E-trade - Amazon.Com, Web Search – Google, Web-wide following – - Digital Library and Autonomous Citation Indexing. Web measurements and estimations From a trial human behaviorist's perspective, the Web is the ideal trial device. Not exclusively does it gives the capacity of estimating human conduct at a small scale level, it wipes out the inclination of the subjects realizing that they are taking an interest in a test, and enables the quantity of members to be numerous requests of size bigger than regular investigations. Be that as it may, we have not yet started to value the genuine effect of a progressive test mechanical assembly for human conduct thinks about. The Web Lab of Amazon is one of the early endeavors toward this path. It is normally utilized to quantify the client effect of different proposed changes - on operational measurements such as site visits and visit/purchase proportions, just as on budgetary measurements, for example, income and benefit - before an arrangement choice is made. For instance, amid Spring hour long investigation on the live site did, including more than one million client sessions, before the choice to change Amazon's logo was made. Research needs to be done in building up the correct arrangement of Web in developing the right set of Web metrics, and their measurement procedures so that various Web phenomena can be studied.
PROCEDURE MINING
Mining of 'advertise container' data, gathered at the purpose of-offer in any store, has been one of the unmistakable triumphs of data mining. Be that as it may, this data gives just the final product of the procedure, and that excessively choices that wound up in item buy. Snap stream data gives the chance to a point by point take a gander at the basic leadership process itself, also, knowledge removed from it very well may be utilized for advancing the procedure, affecting the procedure, and so on. Underhill has definitively demonstrated the estimation of procedure information in understanding clients' conduct in customary shops. Research should be completed in (I) separating process models from utilization data, (ii) seeing how unique portions of the procedure model effect different Web measurements of intrigue, and (iii) how the procedure models change because of different changes that are made, for example evolving upgrades to the client. Transient development of the Web Society's association with the Web is changing the Web just as the way individuals communicate. While putting away the historical backdrop of the majority of this association in one spot is unmistakably as well stunning an errand, in any event the progressions to the Web are being recorded by the spearheading Web Archive venture. Research should be done in removing worldly models of how Web content, Web any rate parts of) utilization data from that point Web destinations. With these wellsprings of data accessible, there is a huge extension of research to create procedures for breaking down of how the Web develops after some time. Web administrations execution enhancement As administrations over the Web keep on developing, there will be a proceeding with need to make them vigorous, versatile and productive. Web mining can be connected to all the more likely comprehend the conduct of these administrations, and the knowledge separated can be valuable for different sorts of enhancements. The effective application ofWeb mining for prescient pre-getting of pages by a program has been exhibited. It is important to do investigation of the Web logs for web administrations execution streamlining. Research is required in creating Web mining procedures to improve different other parts of Web administrations. Misrepresentation and danger examination The secrecy given by theWeb has prompted a huge increment in endeavored misrepresentation, from unapproved utilization of individual Visas to hacking into charge card databases for coercion purposes. One more precedent is closeout misrepresentation, which has been expanding on well known destinations like eBay [USDoJ2002]. Since every one of these cheats are being executed through the Internet, Web mining is the ideal examination method for distinguishing and avoiding them. Research issues incorporate creating systems to perceive known cheats, and portray and after that perceive obscure or novel fakes, and so forth. The issues in digital danger examination and interruption discovery are very comparative in nature.
WEB MINING AND SECURITY
While there are numerous advantages to be picked up from Web mining, an unmistakable disadvantage is the potential for extreme infringement of protection. Open demeanor towards protection is by all accounts practically schizophrenic, for example individuals state a certain something and do a remarkable inverse. For instance, well known cases like appear to show that individuals esteem their security, while involvement with significant internet business entries demonstrates that over 97% surprisingly acknowledge treats without any issues - and the vast majority of them really like the personalization highlights that can be given based on it. Spiekerman et al. have exhibited that individuals were eager to give genuinely close to home information about themselves, which was totally insignificant to the job needing to be done, whenever gave the correct improvement to do as
mind towards protection might be that we have a bi-modular perspective on protection, in particular that "I'd be eager to share information about myself as long as I get a few (substantial or elusive) profits by it, for whatever length of time that there is a verifiable certification that the information won't be manhandled". The research issue produced by this mentality is the need to create approaches, systems what's more, instruments that can be utilized to check and approve that aWeb administration is to be sure utilizing an end-client's information in a way steady with its expressed arrangements.
WEB INFORMATION EXTRACTION
Since web data are semi-organized or even unstructured, which cannot be controlled by customary database methods, it is basic to separate web data to port them into databases for further dealing with. The reason for Web Information Extraction (IE) in our web mining research emotionally supportive network is to extricate a specific bit of web documents helpful for a research venture. A specific web data set can be dispersed among different web has and have different groups. IE takes web documents as information, identifies a center part, and changes that piece into a organized and unambiguous organization. This part depicts the structure of a web IE framework. We _rst present the idea of wrappers. Area 2 gives a diagram of the related work in the present IE frameworks. At that point, a portion of our past work is exhibited and a proposed arrangement is examined. Half and half Information Extraction We planned and actualized a manual wrapper in our past work. The manual wrapper is anything but difficult to create and actualize. Nonetheless, clients must make different wrappers for use with each difference in web documents. Upkeep cost will be unreasonably high for this manual wrapper approach. Also, numerous content like documents exist on the web, for example talk board, news gathering. We can characterize web contents as two sorts: organized/semi-organized content and free content. The first type has data things (e.g., names, SSN, and so on.). Precedent web documents of this sort incorporate on-line insights, tables, and so on. The second sort comprises of free languages, e.g., promotions, messages, and so forth. Methods, for example, wrapper enlistment and modeling-based devices are appropriate with pages of the first type on the grounds that The hybrid information extraction architecture such apparatuses depend on delimiters of data to make extraction rules. NLP methods are based on syntactic and semantic requirements can work with the two sorts. Be that as it may, language examination and learning rules ages are intricate. These strategies are expensive for organized content. A web mining research emotionally supportive network must manage the two kinds, in light of the fact that both contain helpful information for research.
CONCLUSION
In this paper we have examined about the research issues and the downsides of the current procedures. More research work should be done on the web mining domain as it will rule the web soon. Web mining alongside semantic web known as semantic web mining is to be concentrated that is developing which encourages us to defeat the cons of web mining. In spite of the fact that different calculations and systems have been proposed still work must be done in finding new devices to mine the web. As the Web and its utilization keeps on developing, so develops the chance to break down Web data and concentrate all way of helpful knowledge from it. The previous five years have seen the rise of Web mining as a quickly developing territory, because of the endeavors of the research network just as different associations that are rehearsing it. In this paper we have quickly depicted the key software engineering commitments made by the field, various unmistakable applications, and sketched out some encouraging regions of future research. Our expectation is that this review gives a beginning stage to productive exchange. The extraction of an organized/semi-organized web document is as per the following. Right off the bat, the parser makes a parse tree for the document. Furthermore, clients input test patterns which they need to extricate. At that point, extraction heuristics are produced to coordinate the example patterns. Wrappers are made based on extraction heuristics to remove data. On the off chance that the removed data contains free content which needs further extraction, the procedure will be changed to utilize
REFERENCES
Zdravko Markov, Daniel T. Larose (2007). ―Web content structure and Usage‖,Wiley. V. Bharanipriya & V. Kakakshi Prasad ―WEB CONTENT MININGTOOLS:A COMPARITIVE STUDY‖. Rekha Jain and Dr. G.N. (2011). ―Purohit page ranking algorithm for webmining.International journal of computer applications‖,(0975 .8887volume13.No.5,January 2011. Wang and Liu 1998;Moh, Lim and Ng 2000 Srivastava, Cooley, Deshpande, and Tan 2000 Jaideep Srivastavay, Robertcooleyz, Mukund deshpande and pangningtan (2000). web usage mining: discovery and applications of usagepatterns from web data‖ published in ACM SIGKDD Explorations .copyright 2000 ACM SIG KDD, VolumeI, Issue 2, Jan 2000, pp. 12-23. Dr. A. C. Mondal and Sourav Maitra (2010). ―A Study of web mining Research- last few years and the road Ahead‖ publish in ICCS, Burdwan University 2010. Wangbin Hu. Junpeng Yuan and Yuantao Song : ―The Research of a web mining method in Research Areas‖ published in Sixth Wah on international center on E-Business, e-Business Track. Kosala, Raymond; Hendrik Blockeel : ―Wen Mining Research :A Survey‖ SIGKDD Explorations. Mustapasa, A. Karahoca, D. Krahoca and H. Uzunboylu (2011). ―Hello World, Web mining for E-Learning‖, Procedia. Computer science vol 3, 2011. P. Kolari and A. Joshi (2004). Web mining: Research and practice, Comput. Sci. Eng.July/August pp. 42–53. R. Kosala and H. Blockeel (2000). Web mining research: A survey, ACM SIGKDD Explor.2 pp. 1–15. K. Lau, K. Lee, Y. Ho and P. Lam (2004). Mining the web for business intelligence; homepage analysis in the Internet era, J. Database Marketing Customer Strategy Management12(1) pp. 32–54. www.database.cis.nctu.edu.tw/seminars/2003F/TWM/slides/p.ppt. B. Liu (2005). Web content mining, http://www.cs.uic.edu/∼liub/WebContent-Mining.html. C. Liu (2006). Web content mining (29 November 2006), CM SIGKDD Webcast. B. Liu (2007). Web Data Mining: Exploring Hyperlinks, Contents and Usage Data (Springer Verlag Press, 2007), ISBN-13: 978-3-540-37881-5. B. Liu and K. Chang (2004). Editorial: Special issue on web content mining, SIGKDD Explorations6(2) pp. 1–4. Z. Markov and D. T. Larose (2007). New Report ―Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage‖ Features Web Structure Mining, Web Content Mining and Web Usage Mining (John Wiley and Sons, 25 May 2007),http://biz.yahoo.com/bw/070525/20070525005213.html?.v=1. Megaputer Intellegence Inc., Web Analyst architecture (2007), http://www.megputer.com/products/wa/architechture.php3. B. Mobasher (2007). Web data mining for business intelligence, ECT 584, De Paul University, Chicago, IL (2007), http://maya.cs.depaul.edu/∼classes/ect584/papers/mobasher.pdf. B. Mobasher, R. Colley and J. Srivastava (2000). Automatic personalization based on web usage mining, Commun. ACM 43(8) pp. 142–151. B. Mobasher, H. Dai, T. Luo, S. Yuqing and J. Zhu (2000). Integrating web usage and contentmining for more effective personalization, EC-Web. Z. Pabarskaite and A. Raudys (2007). A process of knowledge discovery from web log data: Systematization and critical review, J. Intell. Inform. Syst. 28(1) pp. 79–104. S. Palmer, The semantic web: An introduction (2001), http://infomesh.net/2001/swintro/. D. Pierrakos, G. Paliouras, C. Papatheodorou and C. Spyropoulos (2003). Web usage mining as a tool for personalization: A survey,
QL2 software (2007). http://www.ql2.com, viewed 5 June 2007. A. Scime (2004). Guest Editor‘s Introduction: Special Issue on Web Content Mining: Special Issue on Web Content Mining, J. Intell. Inform. Syst. 22(3) pp. 211–213. A. Scime (2005). Web Mining: Applications and Techniques (Idea Group Publishing, Hershey, P.A., 2005), ISBN: 1591404142. Semantic Web Agreement Group, What is the semantic web? (2001) http://swag.webns.net/WhatIsSW. K. A. Smith and A. Ng (2003). Web page clustering using a self-organizing map of user navigation patterns, Decision Support Syst. 35(2) pp. 245–256. Q. Song and M. Shepperd (2009). Mining web browsing patterns for e-commerce. 34(2) pp. 225–256.
Corresponding Author Rifat Jahan*
Lecturer, Imam Abdul Rahman Bin Faisal University rjahan@iau.edu.sa