Clustered Based User-Interest Ontology Construction for Selecting Seed URLs of Focused Crawler
|J. Nisha1, K. Sundareswari2
|Related article at Pubmed, Scholar Google|
With the increasing number of accessible web pages on Internet, it has become gradually difficult for users to find the web pages that are relevant to their particular needs. Knowledge about computer users is very beneficial for assisting them, predicting their future actions. Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user’s personal information requirement and provide more effective information retrieval. In this paper, a seed URLs selection approach is proposed based on user-interest ontology. In order to enrich semantic query, first intend to apply Formal Concept Analysis to construct user-interest concept lattice with user log profile. By using concept lattice merger, construct the user-interest ontology which can describe the implicit concepts and relationships between them more appropriately for semantic representation and query match. On the other hand, make full use of the user-interest ontology for extracting the user interest topic area and expanding user queries to receive the most related pages as seed URLs, which is an entrance of the focused crawler. In particular, focus on how to refine the user topic area using the bipartite directed graph.