Academia.eduAcademia.edu

Outline

CIST System for CL-SciSumm 2016 Shared Task

2016

Abstract

This paper introduces the methods and experiments applied in CIST system participating in the CLSciSumm 2016 Shared Task at BIRNDL 2016. We have participated in the TAC 2014 Biomedical Summarization Track, so we develop the system based on previous work. This time the domain is Computational Linguistics (CL). The training corpus contains 20 topics from Training-Set-2016 and Development-Set-Apr8 published by CL-SciSumm 2016. As to Task 1A and 1B, we mainly use rule-based methods with various features of lexicons and similarities; meanwhile we also have tried the machine learning method of SVM. As to Task 2, hLDA topic model is adopted for content modeling, which provides us knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We then combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the Reference Paper from its cited text spans. Finally we extract the ...

References (33)

  1. Wan, X., Yang, J., Xiao, J.: Using Cross-Document Random Walks for Topic-Focused Multi-Document. In: IEEE / Wic / ACM International Conference on Web Intelligence, pp. 1012-1018. (2006).
  2. Garcí a, J., Laurent, F., Gillard, O. F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop -Notebook papers and results. (2008).
  3. Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings. (2008).
  4. Conroy, J., Schlesinger, J. D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings. (2008).
  5. Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings. (2009).
  6. Annie, L., Ani, N.: Predicting Summary Quality using Limited Human Input. In: TAC 2009 Proceedings. (2009).
  7. Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST (Vol. 150). (2010).
  8. Kokil, J., Muthu, K.C., Sajal, R., Min-Yen, K.: Overview of the 2nd Computational Linguistics Scientific Document Sum- marization Shared Task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-enhanced Infor- mation Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA. (2016).
  9. Genest, P., Lapalme, G., Qué bec, M.: Text Generation for Abstractive Summarization. In: TAC 2010 Proceedings. (2010).
  10. Jin, F., Huang, M., Zhu, X.: The THU Summarization Systems at TAC 2010. Text Analysis Conference. (2010)
  11. Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 500-509. Portland, Oregon (2010).
  12. Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided Summarization with Aspect Recognition. In: TAC 2011 Proceedings. (2011).
  13. Marina, L., Natalia, V.: Multilingual Multi-Document Summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization. (2013).
  14. Steinberger, J.: The UWB Summariser at Multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization. (2013).
  15. Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual Summarization: Dimen- sionality Reduction and a Step Towards Optimal Term Coverage. In: Multiling 2013 Workshop on Multilingual Multi- Document Summarization, pp. 3899-3905. (2013).
  16. Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the dis-course structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization. (2013).
  17. El-Haj, M., Rayson, P.: Using a Keyness Metric for Single and Multi Document Summarisation. Multiling 2013 Workshop, ACL. (2013).
  18. Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Applied Intelligence, 40(40), 592-600. (2014).
  19. Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Transactions on Au- dio Speech & Language Processing, 21(3), 649-658. (2013).
  20. Xu, Y. D., Zhang, X. D., Quan, G. R., Wang, Y. D.: MRS for multi-document summarization by sentence extraction. Tele- communication Systems, 53(1), 91-98. (2013).
  21. Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data, pp. 91-97. ACM. (2008)
  22. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recom- mender Systems, pp. 61-68. (2009).
  23. Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Advances in Neural Infor- mation Processing Systems, 17, 537--544. (2010).
  24. Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning, pp. 113--120. (2006).
  25. Wang, C., Blei, D.M.: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7-10 December 2009, Vancouver, British Columbia, Canada, pp. 1982-1989. (2009).
  26. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. Journal of the American statistical associa- tion. (2012).
  27. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of top- ic hierarchies. Journal of the ACM, 57(2), 87-103. (2010).
  28. Celikyilmaz, A., Hakkani-Tur, D.: A Hybrid Hierarchical Model for Multi-Document Summarization. ACL 2010, Proceed- ings of the, Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden, pp. 815-824. (2010).
  29. Ren, Z., De Rijke, M.: Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes. International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 93-102. ACM. (2015).
  30. Elkiss, A., Shen, S., Fader, A., Güne&#x f; Erkan, States, D., Radev, D.: Blind men and elephants: what do citation summar- ies tell us about a research article?. Journal of the American Society for Information Science & Technology, 59(1), 51-62. (2008).
  31. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Sci- ence. (2013).
  32. Le, Q. V., Mikolov, T.: Distributed representations of sentences and documents. Computer Science, 4, 1188-1196. (2014).
  33. Heng, W., Yu, J., Li, L., Liu, Y.: Research on Key Factors in Multi-document Topic Modelling Application with HLDA. Journal of Chinese Information Processing, 27(6): 117-127. (2013).