Novelty Detection: A Perspective from Natural Language Processing
Computational Linguistics
https://doi.org/10.1162/COLI_A_00429Abstract
The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the web, there is an accompanying menace of redundancy. A considerable portion of the web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant ones. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and nu...
References (91)
- Ahmad, Amin, Noah Constant, Yinfei Yang, and Daniel Cer. 2019. ReQA: An Evaluation for End-to-End Answer Retrieval Models. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, MRQA@EMNLP 2019, Hong Kong, China, November 4, 2019, pages 137-146, Association for Computational Linguistics.
- Allan, James, Victor Lavrenko, Daniella Malin, and Russell Swan. 2000. Detections, Bounds, and Timelines: Umass and TDT-3. In Proceedings of topic detection and tracking workshop, pages 167-174.
- Allan, James, Ron Papka, and Victor Lavrenko. 1998. On-line New Event Detection and Tracking. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 37-45, ACM.
- Allan, James, Courtney Wade, and Alvaro Bolivar. 2003a. Retrieval and novelty detection at the sentence level. In SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28 -August 1, 2003, Toronto, Canada, pages 314-321.
- Allan, James, Courtney Wade, and Alvaro Bolivar. 2003b. Retrieval and Novelty Detection at the Sentence Level. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 314-321, ACM.
- Augenstein, Isabelle, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, and Jakob Grue Simonsen. 2019. Multifc: A real-world multi-domain dataset for evidence-based fact checking of claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 4684-4696, Association for Computational Linguistics.
- Bagga, Amit and Breck Baldwin. 1999. Cross-Document Event Coreference: Annotations, Experiments, and Observations. In Coreference and Its Applications. Ghosal et al. Textual Novelty Detection
- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, pages 150-165.
- Barrón-Cedeño, Alberto, Marta Vila, Maria Antònia Martí, and Paolo Rosso. 2013. Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection. Comput. Linguistics, 39(4):917-947.
- Bentivogli, Clark P. Dagan I. Dang H. T. Giampiccolo D., L. 2011. The Seventh PASCAL Recognizing Textual Entailment Challenge. In In TAC 2011 Notebook Proceedings, November 14-15, 2011, Gaithersburg, Maryland, USA., pages 1-16.
- Bentivogli, Magnini B. Dagan I. Dang H.T. Giampiccolo D., L. 2010. The Sixth PASCAL Recognizing Textual Entailment Challenge. In Proceedings of the Text Analysis Conference (TAC 2010), November 15-16, 2010 National Institute of Standards and Technology Gaithersburg, Maryland, USA., pages 1-60.
- Bernstein, Yaniv and Justin Zobel. 2005. Redundant Documents and Search Effectiveness. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 736-743.
- Bhatnagar, Vasudha, Ahmed Sultan Al-Hegami, and Naveen Kumar. 2006. Novelty as a Measure of Interestingness in Knowledge Discovery. Constraints, 9:18.
- Bowman, Samuel R., Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A Large Annotated Corpus for Learning Natural Language Inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632-642, Association for Computational Linguistics, Lisbon, Portugal.
- Brants, Thorsten, Francine Chen, and Ayman Farahat. 2003. A system for new event detection. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 330-337, ACM.
- Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877-1901, Curran Associates, Inc.
- Burrows, Steven, Martin Potthast, and Benno Stein. 2013. Paraphrase Acquisition via Crowdsourcing and Machine Learning. ACM Transactions on Intelligent Systems and Technology (TIST), 4(3):43.
- Bysani, Praveen. 2010. Detecting Novelty in the Context of Progressive Summarization. In Proceedings of the NAACL HLT 2010 Student Research Workshop, pages 13-18.
- Carbonell, Jaime and Jade Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335-336, ACM.
- Cer, Daniel, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 -November 4, 2018, pages 169-174, Association for Computational Linguistics.
- Chandar, Praveen and Ben Carterette. 2013. Preference Based Evaluation Measures for Novelty and Diversity. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13, page 413-422, Association for Computing Machinery, New York, NY, USA.
- Chen, Qian, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1657-1668, Association for Computational Linguistics, Vancouver, Canada.
- Chen, Tongfei, Zhengping Jiang, Adam Poliak, Keisuke Sakaguchi, and Benjamin Van Durme. 2020. Uncertain natural language inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 8772-8779, Association for Computational Linguistics. Computational Linguistics Volume 47, Number 4
- Clarke, Charles L.A., Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A Comparative Analysis of Cascade Measures for Novelty and Diversity. WSDM '11, page 75-84, Association for Computing Machinery, New York, NY, USA.
- Clarke, Charles L.A., Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '08, page 659-666, Association for Computing Machinery, New York, NY, USA.
- Clough, Paul D. and Mark Stevenson. 2011. Developing a corpus of plagiarised short answers. Lang. Resour. Evaluation, 45(1):5-24.
- Collins-Thompson, Kevyn, Paul Ogilvie, Yi Zhang, and Jamie Callan. 2002. Information Filtering, Novelty Detection, and Named-Page Finding. In TREC, pages 1-12.
- Conneau, Alexis, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 670-680.
- Dagan, Ido, Oren Glickman, and Bernardo Magnini. 2005. The PASCAL Recognising Textual Entailment Challenge. In Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment, First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005, Revised Selected Papers, volume 3944 of Lecture Notes in Computer Science, pages 177-190, Springer.
- Dagan, Ido, Dan Roth, Mark Sammons, and Fabio Massimo Zanzotto. 2013. Recognizing Textual Entailment: Models and Applications. Synthesis Lectures on Human Language Technologies, 6(4):1-220.
- Dasgupta, Tirthankar and Lipika Dey. 2016. Automatic Scoring for Innovativeness of Textual Ideas. In Knowledge Extraction from Text, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA, February 12, 2016., pages 6-11.
- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Association for Computational Linguistics, Minneapolis, Minnesota.
- Du, Jingfei, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Veselin Stoyanov, and Alexis Conneau. 2021. Self-training improves pre-training for natural language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5408-5418, Association for Computational Linguistics, Online.
- Fleiss, Joseph L. 1971. Measuring Nominal Scale Agreement Among Many Raters. Psychological bulletin, 76(5):378.
- Franz, Martin, Abraham Ittycheriah, J Scott McCarley, and Todd Ward. 2001. First Story Detection: Combining Similarity and Novelty based Approaches. In Topic Detection and Tracking Workshop Report, pages 193-206.
- Gabrilovich, Evgeniy, Susan Dumais, and Eric Horvitz. 2004. Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty. In Proceedings of the 13th international conference on World Wide Web, pages 482-490, ACM.
- Gamon, Michael. 2006. Graph-Based Text Representation for Novelty Detection. In Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pages 17-24, Association for Computational Linguistics.
- Gao, Yang, Nicolò Colombo, and Wei Wang. 2021. Adapting by Pruning: A Case Study on BERT. CoRR, abs/2105.03343:66-78.
- Gardner, Matt, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. AllenNLP: A Deep Semantic Natural Language Processing Platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 1-6, Association for Computational Linguistics, Melbourne, Australia.
- Ghosal, Tirthankar, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, Srinivasa Satya Sameer Kumar Chivukula, and George Tsatsaronis. 2021. Is Your Document Novel? Let Attention Guide You. An Attention Based Model for Document-Level Novelty Detection. Natural Language Engineering, 27(4):427-454.
- Ghosal, Tirthankar, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, George Tsatsaronis, and Srinivasa Satya Sameer Kumar Chivukula. 2018a. Novelty Goes Deep. A Deep Neural Solution to Document Level Novelty Detection. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pages 2802-2813.
- Ghosal, Tirthankar, Amitra Salam, Swati Tiwary, Asif Ekbal, and Pushpak Bhattacharyya. 2018b. TAP-DLND 1.0 : A corpus for document level novelty detection. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018.
- Ghosal, Tirthankar, Abhishek Shukla, Asif Ekbal, and Pushpak Bhattacharyya. 2019. To Comprehend the New: On Measuring the Freshness of a Document. In International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, pages 1-8, IEEE.
- Gipp, Bela, Norman Meuschke, and Corinna Breitinger. 2014. Citation-based plagiarism detection: Practicability on a large-scale scientific corpus. Journal of the Association for Information Science and Technology, 65(8):1527-1540.
- Harman, Donna. 2002a. Overview of the TREC 2002 novelty track. In Proceedings of The Eleventh Text REtrieval Conference, TREC 2002, Gaithersburg, Maryland, USA, November 19-22, 2002, pages 1-20.
- Harman, Donna. 2002b. Overview of the trec 2002 novelty track. In TREC.
- Ho, Tin Kam. 1995. Random Decision Forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278-282, IEEE.
- Huang, Qiang, Jianhui Bu, Weijian Xie, Shengwen Yang, Weijia Wu, and Liping Liu. 2019. Multi-task sentence encoding model for semantic retrieval in question answering systems. In International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, pages 1-8, IEEE.
- Jaccard, Paul. 1901. Étude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles, 37:547-579.
- Karkali, Margarita, François Rousseau, Alexandros Ntoulas, and Michalis Vazirgiannis. 2013. Efficient online novelty detection in news streams. In Web Information Systems Engineering - WISE 2013 -14th International Conference, Nanjing, China, October 13-15, 2013, Proceedings, Part I, pages 57-71.
- Kim, Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1746-1751.
- Kwee, Agus T, Flora S Tsai, and Wenyin Tang. 2009. Sentence-level novelty detection in english and malay. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 40-51, Springer.
- Lai, Alice, Yonatan Bisk, and Julia Hockenmaier. 2017. Natural language inference from multiple premises. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 -December 1, 2017 -Volume 1: Long Papers, pages 100-109, Asian Federation of Natural Language Processing.
- Li, Xiaoyan and W Bruce Croft. 2005. Novelty detection based on sentence level patterns. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 744-751, ACM.
- Lin, Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74-81, Association for Computational Linguistics, Barcelona, Spain.
- Mihalcea, Rada and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25-26 July 2004, Barcelona, Spain, pages 404-411, ACL.
- Mou, Lili, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural Language Inference by Tree-Based Convolution and Heuristic Matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 130-136, Association for Computational Linguistics, Berlin, Germany.
- Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 311-318. Computational Linguistics Volume 47, Number 4
- Parikh, Ankur, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2249-2255, Association for Computational Linguistics, Austin, Texas.
- Pavlick, Ellie and Tom Kwiatkowski. 2019. Inherent disagreements in human textual inferences. Trans. Assoc. Comput. Linguistics, 7:677-694.
- Qin, Yumeng, Dominik Wurzer, Victor Lavrenko, and Cunchen Tang. 2016. Spotting Rumors via Novelty Detection. CoRR, abs/1611.06322:1-12.
- Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1-140:67.
- Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383-2392, Association for Computational Linguistics, Austin, Texas.
- Ru, Liyun, Le Zhao, Min Zhang, and Shaoping Ma. 2004. Improved Feature Selection and Redundance Computing-THUIR at trec 2004 novelty track. In TREC, volume 500-261, pages 1-14.
- Saikh, Tanik, Tirthankar Ghosal, Asif Ekbal, and Pushpak Bhattacharyya. 2017. Document Level Novelty Detection: Textual Entailment Lends a Helping Hand. In Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), pages 131-140, NLP Association of India, Kolkata, India.
- Sánchez-Vega, José Fernando. 2016. Identificación de plagio parafraseado incorporando estructura, sentido y estilo de los textos. Ph.D. thesis, PhD thesis, Instituto Nacional de Astrofísica, Optica y Electrónica.
- Schiffman, Barry and Kathleen R McKeown. 2005. Context and Learning in Novelty Detection. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 716-723, Association for Computational Linguistics.
- Soboroff, Ian. 2004. Overview of the TREC 2004 novelty track. In Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004.
- Soboroff, Ian and Donna Harman. 2003. Overview of the trec 2003 novelty track. In TREC, pages 38-53.
- Soboroff, Ian and Donna Harman. 2005. Novelty Detection: The TREC Experience. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 105-112, Association for Computational Linguistics.
- Stokes, Nicola and Joe Carthy. 2001. First Story Detection using a Composite Document Representation. In Proceedings of the first international conference on Human language technology research, pages 1-8, Association for Computational Linguistics.
- Tarnow, Eugen. 2015. First Direct Evidence of Two Stages in Free Recall. RUDN Journal of Psychology and Pedagogics, (4):15-26.
- Tirthankar Ghosal, Swati Tiwary Asif Ekbal, Amitra Salam and Pushpak Bhattacharyya. 2018. TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pages 1-10, European Language Resources Association (ELRA), Paris, France.
- Trivedi, Harsh, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, and Niranjan Balasubramanian. 2019. Repurposing Entailment for Multi-hop Question Answering Tasks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 2948-2958, Association for Computational Linguistics.
- Tsai, Flora S and Kap Luk Chan. 2010. Redundancy and Novelty Mining in the Business Blogosphere. The Learning Organization, 17(6):490-499.
- Tsai, Flora S, Wenyin Tang, and Kap Luk Chan. 2010. Evaluation of novelty metrics for sentence-level novelty mining. Information Sciences, 180(12):2359-2374.
- Tsai, Flora S and Yi Zhang. 2011. D2s: Document-to-sentence framework for novelty detection. Knowledge and information systems, 29(2):419-433.
- Tulving, Endel and Neal Kroll. 1995. Novelty Assessment in the Brain and Long-Term Memory Encoding. Psychonomic Bulletin & Review, 2(3):387-390.
- Verheij, Arnout, Allard Kleijn, Flavius Frasincar, and Frederik Hogenboom. 2012. A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on, volume 1, pages 431-436, IEEE.
- Wang, Shuohang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. 2018. Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 -May 3, 2018, Conference Track Proceedings, pages 1-16, OpenReview.net.
- Wayne, Charles L. 1997. Topic Detection and Tracking (TDT). In Workshop held at the University of Maryland on, volume 27, page 28, Citeseer.
- Williams, Adina, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112-1122, Association for Computational Linguistics.
- Yang, Yiming, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 28-36, ACM.
- Yang, Yiming, Jian Zhang, Jaime Carbonell, and Chun Jin. 2002. Topic-conditioned novelty detection. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 688-693, ACM.
- Yang, Yinfei, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernández Ábrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2020. Multilingual universal sentence encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2020, Online, July 5-10, 2020, pages 87-94, Association for Computational Linguistics.
- Yang, Zhilin, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 -November 4, 2018, pages 2369-2380, Association for Computational Linguistics.
- Zhang, Min, Ruihua Song, Chuan Lin, Shaoping Ma, Zhe Jiang, Yijiang Jin, Yiqun Liu, Le Zhao, and S Ma. 2003. Expansion-Based Technologies in Finding Relevant and New Information: THU TREC 2002: Novelty Track Experiments. NIST SPECIAL PUBLICATION SP, (251):586-590.
- Zhang, Yi, James P. Callan, and Thomas P. Minka. 2002a. Novelty and Redundancy Detection in Adaptive Filtering. In SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 11-15, 2002, Tampere, Finland, pages 81-88.
- Zhang, Yi, Jamie Callan, and Thomas Minka. 2002b. Novelty and redundancy detection in adaptive filtering. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 81-88, ACM.
- Zhang, Yi and Flora S Tsai. 2009. Combining Named Entities and Tags for Novel Sentence Detection. In Proceedings of the WSDM'09 Workshop on Exploiting Semantic Annotations in Information Retrieval, pages 30-34, ACM.
- Zhao, Pengfei and Dik Lun Lee. 2016. How Much Novelty is Relevant?: It Depends on Your Curiosity. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 315-324, ACM.