Sentiment Analysis of Financial News: Mechanics and Statistics
2021, Data Science for Economics and Finance
https://doi.org/10.1007/978-3-030-66891-4_9Abstract
This chapter describes the basic mechanics for building a forecasting model that uses as input sentiment indicators derived from textual data. In addition, as we focus our target of predictions on financial time series, we present a set of stylized empirical facts describing the statistical properties of lexicon-based sentiment indicators extracted from news on financial markets. Examples of these modeling methods and statistical hypothesis tests are provided on real data. The general goal is to provide guidelines for financial practitioners for the proper construction and interpretation of their own time-dependent numerical information representing public perception toward companies, stocks’ prices, and financial markets in general.
References (51)
- Algaba, A., Ardia, D., Bluteau, K., Borms, S., & Boudt, K. (2020). Econometrics meets sentiment: An overview of methodology and applications. Journal of Economic Surveys, 34(3), 512-547.
- Ardia, D., Bluteau, K., Borms, S., & Boudt, K. (2020, forthcoming). The R package sentometrics to compute, aggregate and predict with textual sentiment. Journal of Statistical Software. https://doi.org/10.2139/ssrn.3067734
- Arias, M., Arratia, A., & Xuriguera, R. (2013). Forecasting with twitter data. ACM Transac- tions on Intelligent Systems and Technology (TIST), 5(1), 8.
- Baker, M., & Wurgler, J. (2007). Investor sentiment in the stock market. Journal of Economic Perspectives, 21(2), 129-152.
- Baumeister, R. F., Bratslavsky, E., Finkenauer, C., & Vohs, K. D. (2001). Bad is stronger than good. Review of General Psychology, 5(4), 323-370.
- Beckers, B., Kholodilin, K. A., & Ulbricht, D. (2017). Reading between the lines: Using media to improve German inflation forecasts. Technical Report, DIW Berlin Discussion Paper. https:// doi.org/10.2139/ssrn.2970466.
- Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., et al. (2019). Quanteda: Quantitative Analysis of Textual Data. Version 1.5.2. https://cran.r-project.org/web/packages/ quanteda/index.html
- Bifet, A., & Frank, E. (2010). Sentiment knowledge discovery in Twitter streaming data. In International Conference on Discovery Science. Lecture Notes in Computer Science (vol. 6332, pp. 1-15).
- Bose, S. (2018). Rsentiment: Analyse Sentiment of English Sentences. Version 2.2.2. https:// CRAN.R-project.org/package=RSentiment
- Chan, W.S. (2003). Stock price reaction to news and no-news: Drift and reversal after headlines. Journal of Financial Economics, 70(2), 223-260.
- Davis, A. K., Piger, J. M., & Sedor, L. M. (2012). Beyond the numbers: Measuring the information content of earnings press release language. Contemporary Accounting Research, 29(3), 845-868.
- Deriu, J., Lucchi, A., De Luca, V., Severyn, A., Muller, S., Cieliebak, M., et al. (2017). Lever- aging large amounts of weakly supervised data for multi-language sentiment classification. In 26th International World Wide Web Conference, WWW 2017, Art. no. 3052611 (pp. 1045- 1052). https://arxiv.org/pdf/1703.02504.pdf
- Deriu, J., Lucchi, A., Gonzenbach, M., Luca, V. D., Uzdilli, F., & Jaggi, M. (2016). Swiss- Cheese at SemEval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 1124-1128)
- Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a), 427-431.
- Diks, C., & Wolski, M. (2016). Nonlinear granger causality: Guidelines for multivariate analysis. Journal of Applied Econometrics, 31(7), 1333-1351.
- Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. In WSDM'08 -Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 231-240). New York, NY, USA: ACM.
- Engelberg, J. E., Reed, A. V., & Ringgenberg, M. C. (2012). How are shorts informed?: Short sellers, news, and information processing. Journal of Financial Economics, 105(2), 260-278.
- Feuerriegel, S., & Proellochs, N. (2019). SentimentAnalysis: Dictionary-Based Sentiment Analysis (2019). Version 1.3-3. https://CRAN.R-project.org/package=SentimentAnalysis
- Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12), 2009.
- Granger, C. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 424-438.
- Heston, S. L., & Sinha, N. R. (2017). News vs. sentiment: Predicting stock returns from news stories. Financial Analysts Journal, 73(3), 67-83.
- Hornik, K. (2019). openNLP: Apache OpenNLP Tools Interface. R Package Version 0.2.7. https://cran.r-project.org/web/packages/openNLP/index.html
- Hutto, C. J., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014 (pp. 216-225).
- Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729.
- Jockers, M. L. (2017). Syuzhet: Extract Sentiment and Plot Arcs from Text. Version 1.0.4. https://CRAN.R-project.org/package=syuzhet
- Kumar, A., & Lee, C. M. (2006). Retail investor sentiment and return comovements. The Journal of Finance, 61(5), 2451-2486.
- Li, F. (2006). Do stock market investors understand the risk sentiment of corporate annual reports? Available at SSRN 898181 . http://www.greyfoxinvestors.com/wp-content/uploads/ 2015/06/ssrn-id898181.pdf
- Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge: Cambridge University Press.
- Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66(1), 35-65.
- Marinazzo, D., Pellicoro, M., & Stramaglia, S. (2008). Kernel method for nonlinear granger causality. Physical Review Letters, 100(14), 144103.
- McGill, R., Tukey, J. W., & Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32, 12-16.
- Meyer, D., Hornik, K., & Feinerer, I. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 1-54.
- Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., & Stoyanov, V. (2016). Semeval-2016 task 4: Sentiment analysis in twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 1-18).
- Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1-10). Berlin: Springer.
- Rao, D., & Ravichandran, D. (2009). Semi-supervised polarity lexicon induction. In EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Com- putational Linguistics (pp. 675-682). Stroudsburg, PA, USA: Association for Computational Linguistics.
- Rinker, T. W. (2019). Sentimentr: Calculate Text Polarity Sentiment. Version 2.7.1. http:// github.com/trinker/sentimentr
- Rinker, T. W. (2020). Qdap: Quantitative Discourse Analysis. Buffalo, New York. Version 2.3.6 https://cran.r-project.org/web/packages/qdap/index.html
- Rizzo, M. L., & Szekely, G. J. (2018) Energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7-4. https://CRAN.R-project.org/package=energy.
- Rozin, P., & Royzman, E. B. (2001). Negativity bias, negativity dominance, and contagion. Personality and Social Psychology Review, 5(4), 296-320.
- Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.
- Serès, A., Cabaña, A., & Arratia, A. (2016). Towards a sharp estimation of transfer entropy for identifying causality in financial time series. In ECML-PKDD. Proceedings of the1st Workshop MIDAS (vol. 1774, pp. 31-42).
- Smedt, T. D., & Daelemans, W. (2012). Pattern for python. Journal of Machine Learning Research, 13(Jun), 2063-2067.
- Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769-2794.
- Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62, 1139-1168.
- Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firm's fundamentals. The Journal of Finance, 63(3), 1437-1467.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.
- Toda, H. Y., & Yamamoto, T. (1995). Statistical inference in vector autoregressions with possibly integrated processes. Journal of Econometrics, 66(1-2), 225-250.
- Tsai, M. F., & Wang, C. J. (2014). Financial keyword expansion via continuous word vector representations. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1453-1458).
- Uhl, M. W., Pedersen, M., Malitius, O. (2015). What's in the news? using news sentiment momentum for tactical asset allocation. The Journal of Portfolio Management, 41(2), 100-112.
- Wibral, M., Pampu, N., Priesemann, V., Siebenhühner, F., Seiwert, H., Linder, M., et al. (2013). Measuring information-transfer delays. PLoS ONE, 8(2), Art. no. e55809.
- Zucco, C., Calabrese, B., Agapito, G., Guzzi, P. H., & Cannataro, M. (2020). Sentiment analysis for mining texts and social networks data: Methods and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(1), Art. no. e1333.