Enhancing semantic search using n-levels document representation
2008, Semantic Search
Abstract
AI
AI
This research explores the limitations of traditional keyword-centric search technologies and presents an approach to enhance semantic search through n-level documents representation. It examines the potential of semantic technologies to match complex user information needs with expressive resource descriptions, outlining key challenges such as user query translation, metadata extraction, and the handling of vague information. Insights from ongoing projects at Cycorp highlight the importance of semantic indexing in improving access to knowledge in extensive document collections, particularly addressing the difficulties encountered in rare search scenarios.
References (130)
- Golder, S., Huberman, B.: The Structure of Collaborative Tagging Systems. Arxiv preprint cs.DL/0508082 (2005)
- Lieberman, H., Liu, H.: Adaptive Linking between Text and Photos Using Common Sense Reasoning. Conference on Adaptive Hypermedia and Adaptive Web Systems (2002)
- Marchetti, A., Tesconi, M., Ronzano, F., Rosella, M., Minutoli, S.: SemKey: A Semantic Collaborative Tagging System. Proceedings of 16th International World Wide Web Conference, WWW2007 (2007)
- Liu, F., Yu, C., Meng, W.: Personalized Web Search by Mapping User Queries to Categories. Proceedings of the eleventh international conference on Information and knowledge management (2002) 558-565
- Nauman, M., Hussain, F.: Common Sense and Folksonomy: Engineering an Intel- ligent Search System. In: Proceedings of ICIET'07: International Conference on Information and Emerging Technologies, IEEE (2007)
- Nauman, M., Khan, S.: Using Personalized Web Search for Enhancing Common Sense and Folksonomy Based Intelligent Search Systems. In: Proceedings of WI'07: IEEE/WIC/ACM International Conference on Web Intelligence. (November 2007)
- Singh, P., Lin, T., Mueller, E., Lim, G., Perkins, T., Zhu, W.: Open Mind Common Sense: Knowledge acquisition from the general public. Proceedings of the First International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems (2002)
- Liu, H., Singh, P.: ConceptNet: A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 22(4) (2004)
- SemSearch 2008, CEUR Workshop Proceedings, ISSN 1613-0073, online at CEUR-WS.org/Vol-334/
- Singh, P.: The public acquisition of commonsense knowledge. Proceedings of AAAI Spring Symposium: Acquiring (and Using) Linguistic (and World) Knowledge for Information Access (2002)
- OMCS: The Open Mind Common Sence Project. Accessed at: http://openmind. media. mit. edu/
- Singh, P., Williams, W.: LifeNet: a propositional model of ordinary human activity. Proceedings of the Workshop on Distributed and Collaborative Knowledge Capture (DC-KCAP) at K-CAP (2003)
- Flickr: About flickr. http://www. flickr. com/ about/ (Retrieved on February 24, 2007)
- References
- Homan H.S.: Making the Case for Patent Searchers. Searcher, vol. 12, March 2004
- Lin D.: Principle-based parsing without overgeneration. Proceedings of the 31st conference on Association for Computational Linguistics, pp. 112-120, 1993.
- PATExpert home page, http://www.patexpert.org
- LUCENE, http://lucene.apache.org/java/docs
- Pianta, E., Girardi, C. and Zanoli, R.. "The TextPro tool suite", Proc. of LREC 2008, Marrakech, Morocco, May 2008.
- Potrich, A, and Pianta, E., "Learning Domain Specific Isa-Relations from the Web", Proc. of LREC 2008, Marrakech, Morocco, May 2008.
- FrameNet site: http://framenet.icsi.berkeley.edu/
- Hones F., Lichter J.: Layout extraction of mixed mode documents. Machine Vision and Applications, Springer-Verlag 1994
- Yang M., Qiu G., Huang Y., Elliman, D.: Near-Duplicate Image Recognition and Content- based Image Retrieval using Adaptive Hierarchical Geometric Centroids. Proceedings of the 18 th International Conference on Pattern Recognition, 2006
- Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, 1999
- Salton, G., Fox, E., Wu, H.: Extended Boolean Information Retrieval, CACM 26(11): pp. 1022--1036, 1983
- Hájek P.: Mathematics of fuzzy logic. Kluwer, 1998.
- Lee, J. H. et al.: On the evaluation of Boolean operators in the extended Boolean retrieval framework. Proceedings of the 16th annual international ACM SIG
- W.S. Hong et al., "A new approach for fuzzy information retrieval based on weighted power-mean averaging operators," Computers and Mathematics with Applications, vol. 53, 2007, pp. 1800-1819.
- P. Basile, M. de Gemmis, A. Gentile, L. Iaquinta, P. Lops, and G. Semeraro. META -MultilanguagE Text Analyzer. In Proc. of the Language and Speech Technnology Conference -LangTech 2008, pages 137-140, 2008.
- P. Basile, M. de Gemmis, A. Gentile, P. Lops, and G. Semeraro. Jigsaw algorithm for word sense disambiguation. In SemEval-2007: 4th Int. Workshop on Semantic Evaluations, pages 398-401. ACL press, 2007.
- C. Corley and R. Mihalcea. Measures of text semantic similarity. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence, 2005.
- H. Cunningham, Y. Wilks, and R. Gaizauskas. Gate: a general architecture for text engineering. In Proc. of the 16th Conf. on Computational Linguistics, pages 1057-1060, Morristown, NJ, USA, 1996. ACL.
- J. Davies and R. Weeks. QuizRDF: Search technology for the Semantic Web. In 37th Hawaii Int. Conf. on System Sciences. IEEE Press, 2004.
- G. Ducatel, Z. Cui, and B. Azvine. Hybrid ontology and keyword matching index- ing system. In Proc. of IntraWebs Workshop at WWW2006, 2006.
- M. Farah and D. Vanderpooten. An outranking approach for rank aggregation in information retrieval. In W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, and N. Kando, editors, Proc. of the 30th SIGIR Conf., pages 591-598. ACM, 2007.
- J. Gonzalo, F. Verdejo, I. Chugur, and J. M. Cigarrán. Indexing with wordnet synsets can improve text retrieval. CoRR, cmp-lg/9808002, 1998.
- R. Grishman and B. Sundheim. Message understanding conference-6: A brief history. In COLING, pages 466-471, 1996.
- J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. CoRR, cmp-lg/9709008, 1997.
- J.-H. Lee. Analyses of multiple evidence combination. In Proc. of the 20th SIGIR Conference, pages 267-276. ACM, 1997.
- B. Magnini and G. Cavagliá. Integrating subject field codes into wordnet. In Proc. of the LREC-2000, pages 1413-1418, 2000.
- G. A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):39- 41, 1995.
- D. I. Moldovan and R. Mihalcea. Using wordnet and lexical operators to improve internet searches. IEEE Internet Computing, 4(1):34-43, 2000.
- P. Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11:95-130, 1999.
- S. Sekine, K. Sudo, and C. Nobata. Extended named entity hierarchy. In Proc. of the LREC-2002, 2002.
- G. Semeraro. Personalized searching by learning wordnet-based user profiles. Jour- nal of Digital Information Management, 5(5):309-322, 2007.
- G. Semeraro, M. Degemmis, P. Lops, and P. Basile. Combining learning and word sense disambiguation for intelligent user profiling. In Proc. of the 20th Int. Joint Conf. on Artificial Intelligence, pages 2856-2861, 2007. M. Kaufmann.
- A. Smeaton, F. Kelledy, and R. ODonnell. TREC-4 experiments at Dublin city university: thresholding posting lists, query expansion with WordNet, and POS tagging of Spanish. In Proc. of TREC-4, 1995.
- E. M. Voorhees. Query expansion using lexical-semantic relations. In Proc. of the 17th SIGIR Conf., pages 61-69, 1994.
- E. M. Voorhees. WordNet: An Electronic Lexical Database, chapter 12: Using WordNet for text retrieval, pages 285-304. Cambridge: The MIT Press, 1998.
- R. H. Baayen, R. Piepenbrock, and L. Gulikers. The CELEX Lexical Database. Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA:, (re- lease 2) [cd-rom] edition, 1995.
- J. Bhogalb, A. Macfarlanea, and P. Smitha. A review of ontology based query expansion. Information Processing & Management, 42(4):866-886, July 2007.
- Jeen Broekstra and Arjohn Kampman. SeRQL: A second generation RDF query language. In Proceedings of the SWAD-Europe Workshop on Semantic Web Stor- age and Retrieval, pages 13-14, Amsterdam, The Netherlands, November 2003.
- F. Ciravegna and Y. Wilks. Designing Adaptive Information Extraction for the Semantic Web in Amilcare. In S. Handschuh and S. Staab, editors, Annotation for the Semantic Web. IOS Press, Amsterdam, 2003.
- Daniel Cunliffe, Carl Taylor, and Douglas Tudhope. Query-based navigation in semantically indexed hypermedia. In HYPERTEXT '97: Proceedings of the eighth ACM conference on Hypertext, pages 87-95, New York, NY, USA, 1997. ACM.
- H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
- Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, May 1998.
- Beeld & Geluid. academia collectie. http://www.academia.nl.
- Graeme Hirst and David St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms, chapter 13, pages 305-332. The MIT Press, Cambridge, MA, USA, 1998.
- L. Hollink, A. Th. Schreiber, J. Wielemaker, and B. J. Wielinga. Semantic an- notation of image collections. In Proceedings of the K-Cap 2003 Workshop on Knowledge Markup and Semantic Annotation, October 2003.
- Laura Hollink, Véronique Malaisé, and A. Th. Schreiber. Enriching a thesaurus to improve retrieval of audiovisual material. Submitted for publication.
- Laura Hollink, Guus Schreiber, and Bob Wielinga. Patterns of semantic relations to improve image content search. Journal of Web Semantics, 5:195-203, 2007.
- Atanas Kiryakov, Borislav Popov, Ivan Terziev, Dimitar Manov, and Damyan Ognyanoff. Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web, 2(1):49-79, December 2004.
- K. Knight and S. Luk. Building a large-scale knowledge base for machine trans- lation. In the AAAI-94 Conference, 1994.
- Alistair Miles and Dan Brickley. SKOS core guide. W3C working draft, November 2005. Electronic document. Accessed February 2008. Available from: http://www.w3.org/TR/swbp-skos-core-guide/.
- A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content- based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-1380, December 2000.
- Mark van Assem, Veronique Malaise, Alistair Miles, and Guus Schreiber. A method to convert thesauri to skos. In Proceedings of the Third European Se- mantic Web Conference (ESWC'06), number 4011 in Lecture Notes in Computer Science, pages 95-109, Budva, Montenegro, June 2006.
- M. Volk, B. Ripplinger, S. Vintar, Paul Buitelaar, D. Raileanu, and B. Sacaleanu. Semantic annotation for concept-based cross-language medical information re- trieval. International Journal of Medical Informatics, 1/3(67):79-112, 2002.
- SemSearch 2008, CEUR Workshop Proceedings, ISSN 1613-0073, online at CEUR-WS.org/Vol-334/ References
- J. Giles, "Internet encyclopaedias go head to head," Nature, vol. 438, pp. 900-901, 2005.
- E. Gabrilovich and S. Markovitch, "Computing semantic relatedness using wikipedia-based explicit semantic analysis.," in Proc. of International Joint Con- ference on Artificial Intelligence (IJCAI 2007), pp. 1606-1611, 2007.
- D. Milne, O. Medelyan, and I. H. Witten, "Mining domain-specific thesauri from wikipedia: A case study," in Proc. of ACM International Conference on Web In- telligence (WI'06), pp. 442-448, 2006.
- K. Nakayama, T. Hara, and S. Nishio, "Wikipedia mining for an association web thesaurus construction," in Proc. of IEEE International Conference on Web In- formation Systems Engineering (WISE 2007), pp. 322-334, 2007.
- M. Strube and S. Ponzetto, "WikiRelate! Computing semantic relatedness using Wikipedia," in Proc. of National Conference on Artificial Intelligence (AAAI-06), pp. 1419-1424, July 2006.
- M. Völkel, M. Krötzsch, D. Vrandecic, H. Haller, and R. Studer, "Semantic wikipedia," in Proc. of International Conference on World Wide Web (WWW 2006), pp. 585-594, 2006.
- S. Chernov, T. Iofciu, W. Nejdl, and X. Zhou, "Extracting semantics relationships between wikipedia categories," in Proc. of Workshop on Semantic Wikis (SemWiki 2006), 2006.
- D. N. Milne, O. Medelyan, and I. H. Witten, "Mining domain-specific thesauri from wikipedia: A case study," in Web Intelligence, pp. 442-448, 2006.
- F. M. Suchanek, G. Kasneci, and G. Weikum, "Yago: a core of semantic knowl- edge," in WWW '07: Proceedings of the 16th international conference on World Wide Web, (New York, NY, USA), pp. 697-706, ACM, 2007.
- D. Klein and C. D. Manning, "Accurate unlexicalized parsing," in Proc. of Meeting of the Association for Computational Linguistics (ACL 2003), pp. 423-430, 2003.
- D. P. T. Nguyen, Y. Matsuo, and M. Ishizuka, "Relation extraction from wikipedia using subtree mining," in Proc. of National Conference on Artificial Intelligence (AAAI-07), pp. 1414-1420, 2007.
- D. Bateman and A. Adler, "Sparse matrix implementation in octave," 2006.
- SemSearch 2008, CEUR Workshop Proceedings, ISSN 1613-0073, online at CEUR-WS.org/Vol-334/ References
- Völkel, M., Haller, H.: Conceptual data structures (cds) -towards an ontology for semi-formal articulation of personal knowledge. In: Proc. of the 14th International Conference on Conceptual Structures 2006, Aalborg University -Denmark (2006)
- Völkel, M., Haller, H., Abecker, A.: Modelling higher-level thought structures - method and tool. In: Proceedings of Workshop on Foundations and Applications of the Social Semantic Desktop. (2007)
- Völkel, M., Haller, H., Bolinder, W., Davis, B., Edlund, H., Groth, K., Gudjons- dottir, R., Kotelnikov, M., Lannerö, P., Lundquist, S., Sogrin, M., Sundblad, Y., Westerlund, B.: Conceptual data structure tools. Deliverable 1.2, nepomuk consor- tium (2008)
- Haller, H.: imapping -a graphical approach to semi-structured knowledge modelling. In Rutledge, L., ed.: Proceedings of the The 3rd International Semantic Web User Interaction Workshop (SWUI2006). (2006) Poster and extended abstract presented at the The 3rd International Semantic Web User Interaction Workshop.
- Krötzsch, M., Rudolph, S., Hitzler, P.: Complexity boundaries for horn descrip- tion logics. In: Proceedings of the 22nd AAAI Conference on Artficial Intelligence, Vancouver, British Columbia, Canada, AAAI Press (2007) 452-457
- Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. The impact of caching on search engines. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 183-190, New York, NY, USA, 2007. ACM.
- V. Richard Benjamins, John Davies, Ricardo Baeza-Yates, Peter Mika, Hugo Zaragoza, Mark Greaves, Jose Manuel Gomez-Perez, Jesus Contreras, John Domingue, and Dieter Fensel. Near-term prospects for semantic technologies. In- telligent Systems, 23(1):76-88, 2008.
- SemSearch 2008, CEUR Workshop Proceedings, ISSN 1613-0073, online at CEUR-WS.org/Vol-334/ References
- Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (WWW), ACM Press (2007) 697 -706
- Chakrabarti, S.: Building blocks for semantic search engines: Ranking and compact indexing in entity-relation graphs. Keynote talk at the International Workshop on Intelligent Information Access (IIIA-2006) (2006)
- Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW '03: Proceedings of the 12th international conference on World Wide Web, New York, NY, USA, ACM Press (2003) 700-709
- Tran, T., Cimiano, P., Rudolph, S., Studer, R.: Ontology-based interpretation of keywords for semantic search. In: Proceedings of the 6th 6th International Semantic Web Conference, Busan, Korea (2007) 523-536
- Schenkel, R., Theobald, A., Weikum, G.: Semantic similarity search on semistruc- tured data with the xxl search engine. Information Retrieval 8(4) (2005) 521-545
- Bonino, D., Corno, F., Farinetti, L., Bosca, A.: Ontology driven semantic search. SIGIR Forum 1(6) (2004) 1597-1605
- Chakrabarti, S., Puniyani, K., Das, S.: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In: WWW '06: Proceedings of the 15th international conference on World Wide Web, New York, NY, USA, ACM Press (2006) 717-726
- Auer, S., Bizer, C., Lehmann, J., Kobilarov, G., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: In: Proceedings of ISWC 2007. (2007)
- Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. In: Natural Language Processing and Information Systems. Springer, Berlin / Heidelberg (May 2005)
- Blohm, S., Cimiano, P.: Using the web to reduce data sparseness in pattern- based information extraction. In: Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Warsaw, Poland, Springer (SEP 2007) 18-29
- Hampp, T., Lang, A.: Semantic search in websphere information integrator om- nifind edition: The case for semantic search. IBM Developer Works (2005) References
- Aguayo, F., Roca, L.: Estudio introductorio. In: Aguayo, F., Roca, L. (eds.): Imá- genes e investigación social. Instituto Mora, México (2005) 9-28 http://durito. nongnu.org/docs/Aguayo_Roca_2.html
- Bizer, C., Lee, R., Pietriga, E.: Fresnel Display Vocabulary for RDF: User's Manual. World Wide Web Consortium (2005) http://www.w3.org/2005/04/fresnel-info/ manual-20050726/
- Green, A.: Logic and a Little Language for Heritage Resource on the Semantic Web. Poster accompanying a system demonstration, presented at the 4th Euro- pean Semantic Web Conference (June, 2007) http://durito.nongnu.org/docs/ innsbruck2.pdf
- Green, A. R.: Metadatos transformados: Archivos digitales, la Web Semántica y el nuevo paradigma de la catalogación. In: Amador C., P., Robledano A., J., Ruiz F., R. (eds): Quintas Jornadas: Imagen, Cultura y Tecnología. Universidad Carlos III de Madrid: Madrid (2007) 11-22 http://durito.nongnu.org/docs/metadatos_ transformados_green.pdf
- Green, A. R.: Rescate de la memoria. Ciencia y Desarrollo (Sept. 2006). Consejo Nacional de Ciencia y Tecnología, Mexico
- Kochut, K. and Janik, M., SPARQLeR: Extended Sparql for Semantic Association Discovery (2007) http://www.eswc2007.org/pdf/eswc07-kochut.pdf
- Marcas de Fuego de la Biblioteca "José María Lafragua" de la BUAP. Autonomous University of Puebla (2006) http://www.marcasdefuego.buap.mx/
- Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press: Cambridge, UK (2000)
- SemSearch 2008, CEUR Workshop Proceedings, ISSN 1613-0073, online at CEUR-WS.org/Vol-334/ References
- Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Re- trieval. ACM Press / Addison-Wesley, 1999.
- Irene Celino, Emanuele Della Valle, Dario Cerizza, and Andrea Turati. Squiggle: a semantic search engine for indexing and retrieval of multimedia content. In SEMPS, pages 20-34, 2006.
- H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
- John Davies and Richard Weeks. QuizRDF: Search technology for the semantic web. In HICSS '04: Proceedings of the 37th Annual Hawaii International Con- ference on System Sciences (HICSS'04) -Track 4, page 40112, Washington, DC, USA, 2004. IEEE Computer Society.
- Fausto Giunchiglia, Maurizio Marchese, and Ilya Zaihrayeu. Encoding classifi- cations into lightweight ontologies. In Journal on Data Semantics (JoDS) VIII, Winter 2006.
- Fausto Giunchiglia, Pavel Shvaiko, and Mikalai Yatskevich. Discovering missing background knowledge in ontology matching. In Proc. of ECAI, 2006.
- Fausto Giunchiglia and Mikalai Yatskevich. Element level semantic matching. In Meaning Coordination and Negotiation workshop, ISWC, 2004.
- Fausto Giunchiglia, Mikalai Yatskevich, and Enrico Giunchiglia. Efficient semantic matching. In Proc. of ESWC, Lecture Notes in Computer Science. Springer, 2005.
- SemSearch 2008, CEUR Workshop Proceedings, ISSN 1613-0073, online at CEUR-WS.org/Vol-334/
- Fausto Giunchiglia, Mikalai Yatskevich, and Pavel Shvaiko. Semantic matching: Algorithms and implementation. Journal on Data Semantics (JoDS), 9:1-38, 2007.
- M. Hildebrand, J. van Ossenbruggen, and L. Hardman. An analysis of search-based user interaction on the semantic web. Technical Report INS-E0706, Centrum voor Wiskunde en Informatica, MAY 2007.
- Bernardo Magnini, Manuela Speranza, and Christian Girardi. A semantic-based approach to interoperability of classification hierarchies: evaluation of linguistic techniques. COLING '04: Proceedings of the 20th international conference on Computational Linguistics, pages 11-33, 2004.
- Christoph Mangold. A survey and classification of semantic search approaches. Int. J. Metadata Semantics and Ontology, 2(1):23-34, 2007.
- George Miller. WordNet: An electronic Lexical Database. MIT Press, 1998.
- M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, 1980.
- C. Rocha, D. Schwabe, and M. de Aragao. A hybrid approach for searching in the semantic web. In Proceedings of the 13th International World Wide Web Confer- ence, 2004.
- Hinrich Schutze and Jan O. Pedersen. Information retrieval based on word senses. In Fourth Annual Symposium on Document Analysis and Information Retrieval, 1995.
- J. F. Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, 1984.
- Christopher Stokoe, Michael P. Oakes, and John Tait. Word sense disambiguation in information retrieval revisited. pages 159-166, 2003.
- William A. Woods. Conceptual indexing: A better way to organize knowledge. 1997.
- I. Zaihrayeu, L. Sun, F. Giunchiglia, W. Pan, Q. Ju, M. Chi, and X. Huang. From web directories to ontologies: Natural language processing challenges. In 6th International Semantic Web Conference (ISWC 2007). Springer, 2007.