In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.
Published in | International Journal of Intelligent Information Systems (Volume 3, Issue 1) |
DOI | 10.11648/j.ijiis.20140301.11 |
Page(s) | 1-7 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2014. Published by Science Publishing Group |
Labeling, Clustering; Web Search
[1] | JerzyStefanowski and DawidWeiss, "Carrot2 and Language Properties in Web Search Results Clustering", Advances in Web Intelligence, 2003. |
[2] | Carrot, http://search.carrot2.org/stable/search |
[3] | Yippy, http://search.yippy.com/ |
[4] | Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi, "A Novel Ranking Method of Web Search Result Using Clustering and Concordance Count", Proc. of WCCI 2012 IEEE World Congress on Computational Intelligence, pp.902--907, Brisbane, Australia, June 10-15, 2012. |
[5] | Marti A. Hearst and Jan O. Pedersen, "Reexamining the Cluster Hypothesis: Scater/Gather on Retrieval Results", SIGIR'96, ACM, pp.76-84, 1996. |
[6] | Patrick Pantel and Dekang Lin, "Document clustering with committees",SIGIR'02, ACM, pp.199-206, 2002. |
[7] | OmarAlonso, MichaelGertz and RicardoBaeza-Yates, "Clustering and Exploring Search Results using Timeline Constructions",CIKM'09, pp.97-106, 2009. |
[8] | Songhua Xu, Tao Jin and Francis C.M. Lau, "A New visual Search Interface for Web Browsing",Proc. 2nd ACM International Conference on Web Search and Data Mining, ACM, pp.152-161, 2009. |
[9] | OrenZamir, OrenEtzioni, OmidMadani and RichardM. Karp, "Fast and Intuitive Clustering of Web Documents",Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997. |
[10] | OrenZamir and OrenEtzioni, "Web Document Clustering A Feasibility Demonstration", SIGIR 1998, 46-54. |
[11] | Oren Zamir and Oren Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results",WWW'99: Proc. 8th international World Wide Web Conference, pp.1361-1374, Elsevier North-Holland, Inc., 1999. |
[12] | DavidCarmel, HaggaiRoitman and NaamaZwerdling, "Enhancing Cluster Labeling Using Wikipedia",Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp.139-146, 2009. |
[13] | Paolo Ferragina and Antonio Gulli, "A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering",WWW'05: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp.801-810 , 2005. |
[14] | Stanis law Osinski, Jerzy Stefanowski and Dawid Weiss,"Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition",Proc. International IIS: IIPWM'04 Conference, pp.359-368, 2004. |
[15] | ThorstenJoachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization",DTIC Document, 1996. |
[16] | Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi,"Labeling Method with Threshold in Web Search Results",Proc. of FIT2011, pp.365--366, Hakodate, Japan, September 7-9, 2011.(in Japanese) |
APA Style
Masafumi Matsuhara, Toshihiro Yoshida. (2014). An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. International Journal of Intelligent Information Systems, 3(1), 1-7. https://doi.org/10.11648/j.ijiis.20140301.11
ACS Style
Masafumi Matsuhara; Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int. J. Intell. Inf. Syst. 2014, 3(1), 1-7. doi: 10.11648/j.ijiis.20140301.11
AMA Style
Masafumi Matsuhara, Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int J Intell Inf Syst. 2014;3(1):1-7. doi: 10.11648/j.ijiis.20140301.11
@article{10.11648/j.ijiis.20140301.11, author = {Masafumi Matsuhara and Toshihiro Yoshida}, title = {An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies}, journal = {International Journal of Intelligent Information Systems}, volume = {3}, number = {1}, pages = {1-7}, doi = {10.11648/j.ijiis.20140301.11}, url = {https://doi.org/10.11648/j.ijiis.20140301.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20140301.11}, abstract = {In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.}, year = {2014} }
TY - JOUR T1 - An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies AU - Masafumi Matsuhara AU - Toshihiro Yoshida Y1 - 2014/02/20 PY - 2014 N1 - https://doi.org/10.11648/j.ijiis.20140301.11 DO - 10.11648/j.ijiis.20140301.11 T2 - International Journal of Intelligent Information Systems JF - International Journal of Intelligent Information Systems JO - International Journal of Intelligent Information Systems SP - 1 EP - 7 PB - Science Publishing Group SN - 2328-7683 UR - https://doi.org/10.11648/j.ijiis.20140301.11 AB - In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments. VL - 3 IS - 1 ER -