Academia.eduAcademia.edu

Outline

HyKSS: Hybrid keyword and semantic search

2011

Abstract

Keyword search suffers from a number of issues: ambiguity, synonymy, and an inability to handle semantic constraints. Semantic search helps resolve these issues but is limited by the quality of annotations which are likely to be incomplete or imprecise. Hybrid search, a search technique that combines the merits of both keyword and semantic search, appears to be a promising solution. In this paper we describe and evaluate HyKSS, a hybrid search system driven by extraction ontologies for both annotation creation and query interpretation. For displaying results, HyKSS uses a dynamic ranking algorithm. We show that over data sets of short topical documents, the HyKSS ranking algorithm outperforms both keyword and semantic search in isolation, as well as a number of other non-HyKSS hybrid approaches to ranking. 1 Introduction Keyword search for documents on the web works well-often surprisingly well. Can semantic search, added to keyword search, make the search for relevant documents even better? Clearly, the answer should be yes, and researchers are pursuing this initiative (e.g., [1]). The real question, however, is not whether adding semantic search might help, but rather how can we, in a cost-effective way, identify the semantics both in documents in the search space and in the free-form queries users wish to ask. Keyword search has a number of limitations: (1) Polysemy: Ambiguous keywords may result in the retrieval of irrelevant documents. (2) Synonymy: Document publishers may use words that are synonymous with, but not identical to, terms in user queries causing relevant documents to be missed. (3) Constraint satisfaction: Keyword search is incapable of recognizing semantic constraints. If a query specifies "Hondas for under 12 grand", a keyword search will treat each word as a keyword (or stopword) despite the fact that many, if not most, relevant documents likely do not contain any of these words-not even "Hondas" since the plural is relatively rare in relevant documents. Semantic search can resolve polysemy by placing words in context, synonymy by allowing for alternatives, and constraint satisfaction by recognizing specified conditions. Thus, for example, semantic search can interpret the query "Hondas

References (21)

  1. R. Bhagdev, S. Chapman, F. Ciravegna, V. Lanfranchi, and D. Petrelli. Hybrid search: Effectively combining keywords and ontology-based searches. In Proceed- ings of the 5th European Semantic Web Conference (ESWC'08), pages 554-568, Tenerife, Canary Islands, Spain, June 2008.
  2. D.W. Embley, S.W. Liddle, D.W. Lonsdale, J.S. Park, B.-J. Shin, and A. Zitzel- berger. Cross-language hybrid keyword and semantic search. In Proceedings of the 31st International Conference on Conceptual Modeling (ER 2012), pages 190-203, Florence, Italy, October 2012.
  3. D.W. Embley and A. Zitzelberger. Theoretical foundations for enabling a web of knowledge. In Proceedings of the Sixth International Symposium on Foundations of Information and Knowledge Systems (FoIKS'10), pages 211-229, Sophia, Bulgaria, February 2010.
  4. P. Buitelaar, P. Cimiano, P. Haase, and M. Sintek. Towards linguistically grounded ontologies. In Proceedings of the 6th European Semantic Web Conference (ESWC'09), pages 111-125, Heraklion, Greece, May/June 2009.
  5. D.W. Embley. Programming with data frames for everyday data items. In Proceed- ings of the 1980 National Computer Conference, pages 301-305, Anaheim, Califor- nia, May 1980.
  6. D.W. Embley, D.M. Campbell, Y.S. Jiang, S.W. Liddle, D.W. Lonsdale, Y.-K. Ng, and R.D. Smith. Conceptual-model-based data extraction from multiple-record web pages. Data & Knowledge Engineering, 31(3):227-251, 1999.
  7. V. Rus. A first evaluation of logic form identification systems. In R. Mihalcea and P. Edmonds, editors, Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 37-40, Barcelona, Spain, March 2004.
  8. C. Tao, D.W. Embley, and S.W. Liddle. FOCIH: Form-based ontology creation and information harvesting. In Proceedings of the 28th International Conference on Conceptual Modeling (ER2009), pages 346-359, Gramado, Brazil, November 2009.
  9. M. Fernandez, V. Lopez, M. Sabou, V. Uren, D. Vallet, E. Motta, and P. Castells. Semantic search meets the web. In Proceedings of the Second IEEE International Conference on Semantic Computing (ICSC'08), pages 253-260, Santa Clara, Cal- ifornia, 2008.
  10. C.D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Re- trieval. Cambridge University Press, New York, New York, July 2008.
  11. T.-Y. Liu. Learning to Rank for Information Retrieval. Springer, Berlin, Germany, 1st edition, 2011.
  12. A.C. Rencher and W.F. Christensen. Methods of Multivariate Analysis. Wiley, 3rd edition, July, 2012.
  13. P. Castells, M. Fernandez, and D. Vallet. An adaptation of the vector-space model for ontology-based information retrieval. IEEE Transactions on Knowedge and Data Engineering, 19(2):261-272, February 2007.
  14. B. Fazzinga, G. Gianforme, G. Gottlob, and T. Lukasiewicz. Semantic web search based on ontological conjunctive queries. In Proceedings of the Sixth International Symposium on Foundations of Information and Knowledge Systems (FoIKS10), pages 153-172, Sophia, Bulgaria, February 2010.
  15. G. Giannopoulos, N. Bikakis, T. Dalamagas, and T.K. Sellis. GoNTogle: A tool for semantic annotation and search. In Proceedings of the Seventh European Semantic Web Conference (ESWC'10), pages 376-380, May/June 2010.
  16. J. Pound, I.F. Ilyas, and G. Weddell. Expressive and flexible access to web- extracted data: A keyword-based structured query language. In Proceedings of the 2010 International Conference on Management of Data (SIGMOD'10), pages 423-434, Indianapolis, Indiana, June 2010.
  17. H. Wang, T. Tran, and C. Liu. CE 2 : Towards a large scale hybrid search engine with integrated ranking support. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08), pages 1323-1324, Napa Valley, California, October 2008.
  18. L. Zhang, Q. Liu, J. Zhang, H. Wang, Y. Pan, and Y. Yu. Semplore: An IR approach to scalable hybrid query of semantic web data. In Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC'07), pages 652-665, Busan, Korea, November 2007.
  19. V. Lopez, V. Uren, M.R. Sabou, and E. Motta. Cross ontology query answering on the semantic web: An initial evaluation. In Proceedings of the Fifth Inter- national Conference on Knowledge Capture (K-CAP'09), pages 17-24, Redondo Beach, California, September 2009.
  20. D. Damljanovic, M. Agatonovic, and H. Cunningham. Natural language interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. In Proceedings of the 7th Extended Semantic Web Conference (ESWC10), pages 106-120, Heraklion, Greece, May/June 2010.
  21. O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems, 29(2):1-34, April 2011.