Academia.eduAcademia.edu

Outline

Text Classification Techniques in Oil Industry Applications

2014, Advances in Intelligent Systems and Computing

https://doi.org/10.1007/978-3-319-01854-6_22

Abstract

The development of automatic methods to produce usable structured information from unstructured text sources is extremely valuable to the oil and gas industry. A structured resource would allow researches and industry professionals to write relatively simple queries to retrieve all the information regards transcriptions of any accident. Instead of the thousands of abstracts provided by querying the unstructured corpus, the queries on structured corpus would result in a few hundred well-formed results. On this paper we propose and evaluate information extraction techniques in occupational health control process, particularly, for the case of automatic detection of accidents from unstructured texts. Our proposal divides the problem in subtasks such as text analysis, recognition and classification of failed occupational health control, resolving accidents.

References (16)

  1. Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. ECML-98, 10th European Conference on Machine Learning, Chemnitz, DE (1998)
  2. Vapnik, V.: The nature of statistical learning theory. Springer Verlag (1995)
  3. Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Index- ing by Latent Semantic Analysis. Journal of the Society for Information Science (1990) 41 (1990) 391-407
  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Com- puting Surveys (CSUR) 34 (2002) 1 -47
  5. Bloehdorn, S., Hotho,A.: Text Classification by Boosting Weak Learners based on Terms and Concepts. 4th IEEE International Conference on Data Mining (ICDM'04) (2004)
  6. Nagarajan, M., Sheth, A.P., Aguilera, M., Keeton, K., Merchant, A., Uysal, M.: Altering Document Term Vectors for Classification -Ontologies as Expectations of Co-occurrence. LSDIS Technical Report (November, 2006)
  7. Fang, J., Guo, L., Wang, X., Yang, N.: Ontology-Based Automatic Classifica- tion and Ranking for Web Documents. Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007). 627-631 (2007).
  8. Camous, F., Blott, S., Smeaton, A.: Ontology-based MEDLINE document classi- fication. Bioinformatics Research and Development. Lecture Notes in Computer Science Volume 4414, 439-452 (2007).
  9. Gabrilovich, E., Markovitch, S.: Overcomingthe Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. 21th National Conference on Artificial Intelligence, Boston, MA, USA (2006)
  10. Wu, S.-H., Tsai, T.-H., Hsu, W.-L.: Text categorization using automatically ac- quired domain ontology. 6th international workshop on Information retrieval with Asian languages -Volume 11, Sapporo, Japan (2003)
  11. Sheth, A.P., Bertram, C., Avant, D., Hammond, B., Kochut, K.J., Warke, Y.: Se- mantic Content Management for Enterprises and the Web. IEEE Internet Com- puting July/August 2002 (2002)
  12. Hammond, B., Sheth, A.P., Kochut, K.J.: Semantic Enhancement Engine: A Mod- ular Document Enhancement Platform for Semantic Applications over Heteroge- neous Content. Real World Semantic Web Applications, IOS Press, 2002 (2002)
  13. Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowl- edge Acquisition 5 (1993) 199-220, 1993
  14. Sheth, A.P., Arpinar, I.B., Kashyap, V.: Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships. In: Nikravesh, M., Azvin, B., Yager, R., Zadeh, L. (eds.): Enhancing the Power of the Internet: Studies in Fuzziness and Soft Computing. Springer Verlag (2003)
  15. Gospodnetic, O.; Hatcher, E., McCandless M.: Lucene in Action (2nd ed.). Man- ning Publications. ISBN 1-9339-8817-7 (2009).
  16. DicSin: Dicionário de Sinônimos Português Brasil. Apache OpenOffice.org http://extensions.openoffice.org/en/project/DicSin-Brasil (2013)