Academia.eduAcademia.edu

Outline

Extracting Relations from Italian Wikipedia Using Self-Training

Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

https://doi.org/10.4000/BOOKS.AACCADEMIA.10849

Abstract

In this paper, we describe a supervised approach for extracting relations from Wikipedia. In particular, we exploit a self-training strategy for enriching a small number of manually labeled triples with new self-labeled examples. We integrate the supervised stage in WikiOIE, an existing framework for unsupervised extraction of relations from Wikipedia. We rely on WikiOIE and its unsupervised pipeline for extracting the initial set of unlabelled triples. An evaluation involving different algorithms and parameters proves that self-training helps to improve performance. Finally, we provide a dataset of about three million triples extracted from the Italian version of Wikipedia and perform a preliminary evaluation conducted on a sample dataset, obtaining promising results.

References (11)

  1. Giusepppe Attardi. 2015. Wikiextractor. https: //github.com/attardi/wikiextractor.
  2. Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile, Marco de Gemmis, and Pasquale Lops. 2021. Ex- tracting Relations from Italian Wikipedia using Un- supervised Information Extraction. In Vito Wal- ter Anelli, Tommaso Di Noia, Nicola Ferro, and Fedelucio Narducci, editors, Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS. http://ceur-ws.org/Vol- 2947/paper2.pdf.
  3. Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open information extraction from the web. Commun. ACM, 51(12):68-74.
  4. Raffaele Guarasci, Emanuele Damiano, Aniello Min- utolo, Massimo Esposito, and Giuseppe De Pietro. 2020. Lexicon-Grammar based open information extraction from natural language sentences in Ital- ian. Expert Syst. Appl., 143.
  5. Patrick Hohenecker, Frank Mtumbuka, Vid Kocijan, and Thomas Lukasiewicz. 2020. Systematic com- parison of neural architectures and training ap- proaches for open information extraction. In Pro- ceedings of the 2020 Conference on Empirical Meth- ods in Natural Language Processing (EMNLP), pages 8554-8565, Online, November. Association for Computational Linguistics.
  6. Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. 2012. Open Language Learning for Information Extraction. In Jun'ichi Tsujii, James Henderson, and Marius Pasca, edi- tors, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Process- ing and Computational Natural Language Learning, EMNLP-CoNLL, pages 523-534, Jeju Island, Korea, 7. ACL.
  7. Lev Ratinov and Dan Roth. 2009. Design chal- lenges and misconceptions in named entity recog- nition. In Proceedings of the Thirteenth Confer- ence on Computational Natural Language Learning (CoNLL-2009), pages 147-155, Boulder, Colorado, June. Association for Computational Linguistics.
  8. Gabriel Stanovsky and Ido Dagan. 2016. Creating a Large Benchmark for Open Information Extraction. In Jian Su, Xavier Carreras, and Kevin Duh, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, pages 2300-2305, Austin, Texas, USA, 11. The Association for Computational Linguistics.
  9. Milan Straka and Jana Straková. 2017. Tokeniz- ing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In Jan Hajic and Dan Zeman, editors, Proceedings of the CoNLL 2017 Shared Task: Mul- tilingual Parsing from Raw Text to Universal Depen- dencies, pages 88-99, Vancouver, Canada, 8. Asso- ciation for Computational Linguistics. Fei Wu and Daniel S. Weld. 2010. Open Informa- tion Extraction Using Wikipedia. In Jan Hajic, San- dra Carberry, and Stephen Clark, editors, ACL 2010, Proceedings of the 48th Annual Meeting of the As- sociation for Computational Linguistics, pages 118- 127, Uppsala, Sweden, 7. The Association for Com- puter Linguistics.
  10. Mohamed Yahya, Steven Whang, Rahul Gupta, and Alon Y. Halevy. 2014. ReNoun: Fact Extraction for Nominal Attributes. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceed- ings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 325-335, Doha, Qatar, 10. ACL.
  11. David Yarowsky. 1995. Unsupervised word sense dis- ambiguation rivaling supervised methods. In 33rd Annual Meeting of the Association for Computa- tional Linguistics, pages 189-196, Cambridge, Mas- sachusetts, USA, June. Association for Computa- tional Linguistics.