Keyword and Title Based Clustering (KTBC): An Easy and Effective Way to Dynamically Cluster Web Documents
Keywords:
clustering, web mining, data mining, web clustering, KTBC, mining, KDD.Abstract
Web search engine users are most often bound to search documents
through a huge list of web documents returned by the search engine. With rapid
proliferation of web documents on internet, fast and effective mining of information
from this data sources scattered all over the world has become a challenge to the
Information Retrieval (IR) community. The IR community has explored document
clustering as an alternative way of organizing retrieval results but clustering has yet
to be deployed on many search engines. In this research, an effective clustering
approach: Keyword and Title Based Clustering (KTBC) algorithm has been
proposed. The KTBC algorithm is a fast, post-retrieval web document clustering
method, suitable to be used by web search engines. Instead of viewing an extremely
large list of documents, the algorithm returns a smaller number of clusters which
will help web users finding relevant information at more ease. Here we have
provided an algorithmic methodology along with mathematical and logical analysis
and finally simulation result of the algorithm.