Academia.eduAcademia.edu

Outline

Quality of Word Embeddings on Sentiment Analysis Tasks

2017, Springer, NLDB 2017 22nd International Conference on Natural Language & Information Systems, Liege, Belgium

https://doi.org/10.1007/978-3-319-59569-6_42

Abstract

Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skipgram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.

References (18)

  1. Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137-1155, Mar. 2003.
  2. E. C ¸ano and M. Morisio. Characterization of public datasets for recommender systems. In Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), 2015 IEEE 1st International Forum on, pages 249-257, Sept 2015.
  3. E. C ¸ano and M. Morisio. Moodylyrics: A sentiment annotated lyrics dataset. In 2017 Inter- national Conference on Intelligent Systems, Metaheuristics and Swarm Intelligence, Hong Kong, March 2017.
  4. X. Hu, J. S. Downie, and A. F. Ehmann. Lyric text mining in music mood classification. In Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR 2009, Kobe International Conference Center, Kobe, Japan, October 26-30, 2009, pages 411-416, 2009.
  5. R. Johnson and T. Zhang. Effective use of word order for text categorization with convolu- tional neural networks. In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31 -June 5, 2015, pages 103-112, 2015.
  6. W.-S. Lee and D. Yang. Music emotion identification from lyrics. 2013 IEEE International Symposium on Multimedia, 00:624-629, 2009.
  7. O. Levy and Y. Goldberg. Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume 2: Short Papers, pages 302-308, 2014.
  8. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142-150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
  9. R. Malheiro, R. Panda, P. Gomes, and R. P. Paiva. Emotionally-relevant features for classifi- cation and regression of music lyrics. IEEE Transactions on Affective Computing, PP(99):1- 1, 2016.
  10. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
  11. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Process- ing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 3111-3119, 2013.
  12. T. Mikolov, S. W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL- HLT-2013). Association for Computational Linguistics, May 2013.
  13. B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivity. In Proceedings of ACL, pages 271-278, 2004.
  14. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of EMNLP, pages 79-86, 2002.
  15. J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representa- tion.
  16. In A. Moschitti, B. Pang, and W. Daelemans, editors, Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543. ACL, 2014.
  17. H. Pouransari and S. Ghili. Deep learning for sentiment analysis of movie reviews. Technical report, Stanford University, 2014.
  18. H. Shirani-Mehr. Applications of deep learning to sentiment analysis of movie reviews. Technical report, Stanford University, 2014.