Towards Automatic Generation of Questions from Long Answers
2020
Abstract
Automatic question generation (AQG) has broad applicability in domains such as tutoring systems, conversational agents, healthcare literacy, and information retrieval. Existing efforts at AQG have been limited to short answer lengths of up to two or three sentences. However, several real-world applications require question generation from answers that span several sentences. Therefore, we propose a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers. We leverage the large-scale open-source Google Natural Questions dataset to create the aforementioned long-answer AQG benchmark. We empirically demonstrate that the performance of existing AQG methods significantly degrades as the length of the answer increases. Transformer-based methods outperform other existing AQG methods on long answers in terms of automatic as well as human evaluation. However, we still observe degradation in the performance of our best performing models with increasin...
References (63)
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. ICLR.
- Nirmal Baral, Bishnu Hari Paudel, BK Das, Mad- hukar Aryal, Balbhadra Prasad Das, Nilambar Jha, and Madhab Lamsal. 2007. An evaluation of training of teachers in medical education in four medical schools of nepal. Nepal Med Coll J, 9(3):157-61.
- Jonathan C Brown, Gwen A Frishkoff, and Max- ine Eskenazi. 2005. Automatic question genera- tion for vocabulary assessment. In Proceedings of the conference on Human Language Technol- ogy and Empirical Methods in Natural Language Processing, pages 819-826. Association for Com- putational Linguistics.
- Yllias Chali and Sadid A Hasan. 2015. Towards topic-to-question generation. Computational Linguistics, 41(1):1-20.
- Guanliang Chen, Jie Yang, Claudia Hauff, and Geert-Jan Houben. 2018. Learningq: a large- scale dataset for educational question genera- tion. In Twelfth International AAAI Conference on Web and Social Media.
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine transla- tion. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Process- ing (EMNLP), pages 1724-1734.
- Zihang Dai, Zhilin Yang, Yiming Yang, William W Cohen, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
- Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. 2018. Uni- versal transformers. CoRR, abs/1807.03819.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language un- derstanding. arXiv preprint arXiv:1810.04805.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Con- ference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186.
- Xinya Du and Claire Cardie. 2017. Identifying where to focus in reading comprehension for neu- ral question generation. In Proceedings of the 2017 Conference on Empirical Methods in Nat- ural Language Processing, pages 2067-2073.
- Xinya Du, Junru Shao, and Claire Cardie. 2017. Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers), volume 1, pages 1342-1352.
- Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Güney, Volkan Cirik, and Kyunghyun Cho. 2017. Searchqa: A new q&a dataset augmented with context from a search engine. CoRR, abs/1704.05179.
- Sam Ganzfried and Farzana Yusuf. 2018. Optimal weighting for exam composition. Education Sci- ences, 8(1):36.
- Yifan Gao, Jianan Wang, Lidong Bing, Irwin King, and Michael R Lyu. 2018. Difficulty controllable question generation for reading comprehension. arXiv preprint arXiv:1807.03586.
- Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1631-1640.
- Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, et al. 2018. Dureader: a chinese machine reading comprehension dataset from real-world applications. In Proceedings of the Workshop on Machine Reading for Question Answering, pages 37-46.
- Michael Heilman. 2011. Automatic factual ques- tion generation from text. Language Tech- nologies Institute School of Computer Science Carnegie Mellon University, 195.
- Michael Heilman and Noah A. Smith. 2010. Good question! statistical ranking for question gener- ation. In HLT-NAACL.
- Thomas Holme. 2003. Assessment and quality con- trol in chemistry education. Journal of Chemi- cal Education, 80(6):594.
- Wenpeng Hu, Bing Liu, Jinwen Ma, Dongyan Zhao, and Rui Yan. 2018. Aspect-based question gen- eration.
- A Jagannatha. 2016. H yu. bidirectional rnn for medical event detection in electronic health records. In Proceedings of the Association for Computational Linguistics Conference, pages 12-17.
- Yanghoon Kim, Hwanhee Lee, Joongbo Shin, and Kyomin Jung. 2019. Improving neural question generation using answer separation. In Proceed- ings of the AAAI Conference on Artificial Intel- ligence, volume 33, pages 6602-6609.
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
- Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proc. ACL.
- Tomas Kocisky, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gabor Melis, and Edward Grefenstette. 2018. The narra- tiveqa reading comprehension challenge. Trans- actions of the Association of Computational Lin- guistics, 6:317-328.
- Pascal Kuyten, Timothy Bickmore, Svetlana Stoy- anchev, Paul Piwek, Helmut Prendinger, and Mitsuru Ishizuka. 2012. Fully automated gen- eration of question-answer pairs for scripted vir- tual instruction. In International Conference on Intelligent Virtual Agents, pages 1-14. Springer.
- Tom Kwiatkowski, Jennimaria Palomaki, Olivia Rhinehart, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, et al. 2019. Nat- ural questions: a benchmark for question an- swering research.
- Igor Labutov, Sumit Basu, and Lucy Vanderwende. 2015. Deep questions without deep understand- ing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 889-898.
- Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. Race: Large- scale reading comprehension dataset from exam- inations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Pro- cessing, pages 785-794.
- John P Lalor, Hao Wu, Li Chen, Kathleen M Ma- zor, and Hong Yu. 2018. Comprehenotes, an instrument to assess patient reading comprehen- sion of electronic health record notes: develop- ment and validation. Journal of medical Inter- net research, 20(4):e139.
- Nguyen-Thinh Le and Niels Pinkwart. 2015. Eval- uation of a question generation approach using semantic web for supporting argumentation. Re- search and practice in technology enhanced learn- ing, 10(1):3.
- Jindřich Libovickỳ, Jindřich Helcl, and David Mareček. 2018. Input combination strategies for multi-source transformer decoder. In Proceed- ings of the Third Conference on Machine Trans- lation: Research Papers, pages 253-260.
- Samuel A Livingston. 2009. Constructed-response test questions: Why we use them; how we score them. r&d connections. number 11. Educational Testing Service.
- Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412-1421.
- Karen Mazidi and Rodney D Nielsen. 2014a. Lin- guistic considerations in automatic question gen- eration. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 321-326.
- Karen Mazidi and Rodney D. Nielsen. 2014b. Lin- guistic considerations in automatic question gen- eration. In ACL.
- Shashi Narayan, Shay B Cohen, and Mirella La- pata. 2018. Ranking sentences for extractive summarization with reinforcement learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technolo- gies, Volume 1 (Long Papers), volume 1, pages 1747-1759.
- Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660.
- John C Ory. 1983. Improving your test questions.
- Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018. Scaling neural machine translation. ArXiv, abs/1806.00187.
- Anusri Pampari, Preethi Raghavan, Jennifer Liang, and Jian Peng. 2018. emrqa: A large cor- pus for question answering on electronic medical records. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 2357-2368.
- Liangming Pan, Wenqiang Lei, Tat-Seng Chua, and Min-Yen Kan. 2019. Recent advances in neural question generation. arXiv preprint arXiv:1905.08949.
- Alec Radford. 2018. Improving language under- standing by generative pre-training.
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Lan- guage models are unsupervised multitask learn- ers.
- Preethi Raghavan, Siddharth Patwardhan, Jen- nifer J Liang, and Murthy V Devarakonda. 2018. Annotating electronic medical records for question answering. arXiv preprint arXiv:1805.06816.
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopy- rev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383-2392.
- Andrés Rosso-Mateus, Fabio A González, and Manuel Montes-y Gómez. 2017. A two-step neu- ral network approach to passage retrieval for open domain question answering. In Iberoameri- can Congress on Pattern Recognition, pages 566- 574. Springer.
- Vasile Rus, Brendan Wyse, Paul Piwek, Mihai Lintean, Svetlana Stoyanchev, and Christian Moldovan. 2010. The first question generation shared task evaluation challenge. In Proceedings of the 6th International Natural Language Gen- eration Conference.
- Mrinmaya Sachan and Eric Xing. 2018. Self- training for jointly learning to ask and answer questions. In Proceedings of the 2018 Confer- ence of the North American Chapter of the As- sociation for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long Papers), volume 1, pages 629-640.
- Patricia Shapley. 2000. On-line education to de- velop complex reasoning skills in organic chem- istry. Journal of Asynchronous Learning Net- works, 4(2):43-52.
- Linfeng Song, Zhiguo Wang, Wael Hamza, Yue Zhang, and Daniel Gildea. 2018. Leveraging context information for natural question gener- ation. In Proceedings of the 2018 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 2 (Short Pa- pers), volume 2, pages 569-574.
- Amanda Stent, Matthew Marge, and Mohit Sing- hai. 2005. Evaluating evaluation methods for generation in the presence of variation. In Inter- national Conference on Intelligent Text Process- ing and Computational Linguistics, pages 341- 351. Springer.
- Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Adam Trischler, and Yoshua Bengio. 2018. Neural models for key phrase ex- traction and question generation. In Proceedings of the Workshop on Machine Reading for Ques- tion Answering, pages 78-88.
- Xingwu Sun, Jing Liu, Yajuan Lyu, Wei He, Yan- jun Ma, and Shi Wang. 2018. Answer-focused and position-aware neural question generation. In EMNLP.
- Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2017. Newsqa: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191-200.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. At- tention is all you need. In Advances in Neural In- formation Processing Systems, pages 5998-6008.
- Zichao Wang, Andrew S Lan, Weili Nie, Andrew E Waters, Phillip J Grimaldi, and Richard G Bara- niuk. 2018. Qg-net: a data-driven question gen- eration model for educational content. In Pro- ceedings of the Fifth Annual ACM Conference on Learning at Scale, page 7. ACM.
- Yi Yang, Wen tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open- domain question answering. In EMNLP.
- Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Saizheng Zhang, Sandeep Subramanian, and Adam Trischler. 2017. Machine comprehension by text- to-text neural question generation. In Proceed- ings of the 2nd Workshop on Representation Learning for NLP, pages 15-25.
- Rachel Yudkowsky, Yoon Soo Park, and Steven M Downing. 2019. Assessment in health profes- sions education. Routledge.
- Yao Zhao, Xiaochuan Ni, Yuanyuan Ding, and Qifa Ke. 2018. Paragraph-level neural ques- tion generation with maxout pointer and gated self-attention networks. In Proceedings of the 2018 Conference on Empirical Methods in Nat- ural Language Processing, pages 3901-3910.
- Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. 2017. Neural ques- tion generation from text: A preliminary study. In National CCF Conference on Natural Lan- guage Processing and Chinese Computing, pages 662-671. Springer.