Academia.eduAcademia.edu

Outline

Patent Document Summarization Using Conceptual Graphs

International Journal on Natural Language Computing

https://doi.org/10.5121/IJNLC.2017.6302

Abstract

In this paper a methodology to mine the concepts from documents and use these concepts to generate an objective summary of the claims section of the patent documents is proposed. Conceptual Graph (CG) formalism as proposed by Sowa (Sowa 1984) is used in this work for representing the concepts and their relationships. Automatic identification of concepts and conceptual relations from text documents is a challenging task. In this work the focus is on the analysis of the patent documents, mainly on the claim's section (Claim) of the documents. There are several complexities in the writing style of these documents as they are technical as well as legal. It is observed that the general in-depth parsers available in the open domain fail to parse the 'claims section' sentences in patent documents. The failure of in-depth parsers has motivated us, to develop methodology to extract CGs using other resources. Thus in the present work shallow parsing, NER and machine learning technique for extracting concepts and conceptual relationships from sentences in the claim section of patent documents is used. Thus, this paper discusses i) Generation of CG, a semantic network and ii) Generation of abstractive summary of the claims section of the patent. The aim is to generate a summary which is 30% of the whole claim section. Here we use Restricted Boltzmann Machines (RBMs), a deep learning technique for automatically extracting CGs. We have tested our methodology using a corpus of 5000 patent documents from electronics domain. The results obtained are encouraging and is comparable with the state of the art systems.

References (37)

  1. Amati, G. and Ounis, I., (2000) "Conceptual Graphs and First Order Logic", The Computer Journal, 43(1):1-12.
  2. Athanasios, K., Margarita, K., and Marinos K., (2004) "Geographic Knowledge Representation Using Conceptual Graphs", In the Proceedings of 7th AGILE Conference on Geographic Information Science, Greece.
  3. Blum, N., (2001) "A Simplified Realization of the Hopcroft-Karp Approach to Maximum Matching in General Graphs", Tech. Rep. 85232-CS, Computer Science Department, Univ. of Bonn.
  4. Bezdek J.C., (1981) "Pattern Recognition with Fuzzy Objective Function Algorithms", Plenum Press, New York.
  5. Brin and Page, L., (1998) "The anatomy of a large scale hypertextual Web search engine", Computer Networks and ISDN Systems, Vol 30, pp. 1-7.
  6. Brill, Eric., (1994), "Some Advances in transformation Based Part of Speech Tagging". In Proceedings of the Twelfth International Conference on Artificial Intelligence (AAAI-94), Seattle, WA.
  7. Deigo, M., Menno V. and Zaanen., (2005) "Learning of Graph Rules for Question Answering", In the Proceedings of the Australasian Language Technology, Workshop 2005, Sydney, Australia. 15-23.
  8. Erkan and Radev, D., (2004) "Lexpagerank: Prestige in multi document text summarization", In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain.
  9. Hinton, G. and Salakhutdinov, R., (2006) "Reducing the dimensionality of data with neural networks", Science, 313(5786):504 -507.
  10. George Giannakopoulos, Jeff Kubina, John M Conroy, Josef Steinberger, Benoit Favre, Mijail Kabadjov, Udo Kruschwitz, Massimo Poesio., (2015), "MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations", In: Proceedings of SIGDIAL, Prague pp. 270-274.
  11. Hopcroft, J.E., Karp, R.M., (1973) "An n5/2 algorithm for maximum matchings in bipartite graphs", SIAM Journal on Computing, Vol 2(4), pp. 225-231, doi:10.1137/0202019.
  12. John F. S., (1976) "Conceptual Graphs for a Data Base Interface", IBM Journal of Research and Development 20(4). 336-357.
  13. John F.S., (1984) "Connceptual Structures -Information Processing in Mind and Machine". Addison Wesley.
  14. Lin and Hovy, E.H., (2002) "From Single to Multi document Summarization: A Proto-type System and its Evaluation", In Proceedings of ACL-2002.
  15. Lin, C.Y., (2004) "ROUGE: A package for automatic evaluation of summaries", In Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain.
  16. Lin, C.Y and Hovy, E., (2003) "Automatic evaluation of summaries using n-gram co-occurrence", In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada.
  17. Luhn, H.P., (1958) "The automatic creation of literature abstracts", IBM Journal and development, Vol 2(2), pp. 159-165.
  18. Inderjeet Mani., (2001) "Summarization Evaluation: An Overview", In Proceedings of NTCIR.
  19. Malarkodi C.S and Sobha Lalitha Devi, (2012) "A Deeper Look into Features for NE Resolution in Indian Languages", In Proceedings of Workshop on Indian Language Data: Resources and Evaluation, LREC 2012, Istanbul.
  20. McKeownand, K. and Radev, D., (1995) "Generating summaries of multiple news articles", In Proceedings of the 18th Annual International ACM, Seattle, WA, pp.74-82.
  21. Mihalcea R., (2004) "Graph-based ranking algorithms for sentence extraction, applied to text summarization", In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain.
  22. Mihalcea and Tarau.P., (2004) "TextRank -bringing order into texts", In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain.
  23. Mihalcea, P.T., and Figa, E., (2004) "PageRank on semantic networks, with applica-tion to word sense disambiguation", In Proceedings of the 20th International Con-ference on Computational Linguistics (COLING 2004), Geneva, Switzerland.
  24. Lafferty, J., McCallum, A., and Pereira, F., (2001) "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data", In Proceedings of the 18th International Conference on Machine Learning (ICML-2001), pp.282-289.
  25. Menaka S., Pattabhi R.,K.,R. and Sobha, L.D., (2011) "Automatic Identification of Cause-Effect Relations in Tamil Using CRFs", In A. Gelbukh (eds), Springer LNCS volume 6608/2011, 316-327.
  26. Taku Kudo., (2002) "TinySVM Tool Kit: http://chasen.org/~taku/software/TinySVM".
  27. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean., (2013) "Efficient Estimation of Word Representations in Vector Space", In Proceedings of Workshop at ICLR.
  28. Miller, G.A., (1995) "WordNet: A Lexical Database", In Comm.of ACM. 38(11). 39-41.
  29. Mineau, G., Missaoui, R. and Godinx, R., (2000) "Conceptual modeling for data and knowledge management", In Data & Knowledge Engineering, (33)137-168.
  30. Montes-y-Gomez, M., Lopez-Lopez, A. and Gelbukh, A., (2000) "Information Retrieval with Conceptual Graph Matching", In LNCS Volume. 1873, Springer-Verlag.
  31. Montes-y-Gomez, M., Gelbukh, A., Lopez-Lopez, A., and Baeza-Yates, R., (2001) "Flexible Comparison of Conceptual Graphs", In LNCS, Volume 2113, Springer-Verlag.
  32. Ngai, G., and Florian, R., (2001) "Transformation-Based Learning in the Fast Lane", In Proceedings of the NAACL'2001, Pittsburgh, PA, 40-47.
  33. Pattabhi RK Rao, Sobha Lalitha Devi and Paolo Rosso., (2013) "Automatic Identification of Concepts and Conceptual relations from Patents Using Machine Learning Methods", In Proceedings of 10th International Conference on Natural Language Processing (ICON 2013), Noida, India.
  34. Tanveer J. S., Umashanker, Tiwary., (2006) "A Hybrid Model to Improve Relevance in Document Retrieval", Journal of Digital Information Management, Vol. 4(1). 73-81.
  35. Trevor, Cohn, Philip, Blunsom, (2005) "Semantic Role Labeling with Conditional Random Fields", In the Proceedings of CoNLL.
  36. Shih-Yao Y., Von-Wun, S., (2012) "Extract Conceptual Graphs from Plain Texts in Patent Claims", Journal of Engineering Applications of Artificial Intelligence. 25(4). 874-887.
  37. Srivastava, N., Salakhutdinov, R. R. and Hinton, G. E., (2013) "Modeling Documents with a Deep Boltzmann Machine", In Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI).