Academia.eduAcademia.edu

Outline

Research Challenges in Financial Data Modeling and Analysis

2017, Big data

https://doi.org/10.1089/BIG.2016.0074

Abstract

Significant research challenges must be addressed in the cleaning, transformation, integration, modeling, and analytics of Big Data sources for finance. This article surveys the progress made so far in this direction and obstacles yet to be overcome. These are issues that are of interest to data-driven financial institutions in both corporate finance and consumer finance. These challenges are also of interest to the legal profession as well as to regulators. The discussion is relevant to technology firms that support the growing field of FinTech.

References (81)

  1. Graham Bowley -"Computers that Trade on News" (New York Times, 2010-12-22): http://www.nytimes.com/2010/12/23/business/23trading.html ; Roy Kaufman -"How Traders are Using Text and Data Mining to Beat the Market" (2015-02-12), https://www.thestreet.com/story/13044694/1/how-traders-are-using-text-and-data-mining-to-beat-the-mark et.html ; Jen Weiczner -"How Investors are Using Social Media to Make Money" (2015-12-7), http://fortune.com/2015/12/07/dataminr-hedge-funds-twitter-data/ References
  2. Alexe, B., M. A. Hernández, K. Hildrum, R. Krishnamurthy, G. Koutrika, M. Nagarajan, H. Roitman, M. Shmueli-Scheuer, I. Stanoi, C. Venkatramani, R. Wagle. Surfacing Time-Critical Insights from Social Media. SIGMOD Conference 2012: 657-66 (System Demonstration)
  3. Alexe, B. D. Burdick, M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, I. Stanoi, R. Wisnesky. High-Level Rules for Integration and Analysis of Data: New Challenges. In "Search of Elegance in the Theory and Practice of Computation", LNCS (8000) 2013: 36-55
  4. Babura M, Domenico Giannone, Michele Modugno and Lucrezia Reichlin (2013). "Now-casting and Real-Time Data Flow,'' Working Paper No 1564 (July), European Central Bank.
  5. S. Balakrishnan, V. Chu, M. A Hernández, H. Ho, R. Krishnamurthy, S. X. Liu, J. H Pieper, J. S. Pierce, L. Popa, C. M. Robson, L. Shi, I. Stanoi, E. Ting, S. Vaithyanathan, H. Yang. "Midas: Integrating public financial data." SIGMOD Conference 2010: 1187-1190 (System Demo)
  6. Basel Committee on Banking Supervision (2015). "Progress in adopting the principles for effective risk data aggregation and risk reporting," January, http://www.bis.org/bcbs/publ/d308.htm .
  7. Basel Committee on Banking Supervision (2013). "Principles for effective risk data aggregation and risk reporting," January, http://www.bis.org/publ/bcbs239.htm .
  8. Ben-Ami, D., (2016). "A Beginner's Guide: Blockchain," Pensions and Investments Europe (Special Report, Securities Services), July-August, 46-47.
  9. Bernstein, P. A. and L. M. Haas (2008). "Information Integration in the Enterprise," Communications of the ACM , 51(9), September, 72-79.
  10. Billio, Monica, Mila Getmansky, Andrew W. Lo, Loriana Pelizzon (2012). "Econometric measures of connectedness and systemic risk in the finance and insurance sectors,'' Journal of Financial Economics , 104, 535--559.
  11. Blei, D., A. Ng and M. Jordan (2003). "Latent Dirichlet Allocation,'' Journal of Machine Learning Research , 3, 993--1022.
  12. Burdick D., S. Das, M. A. Hernández, C.T. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. Stanoi, S. Vaithyanathan. "Extracting, linking and integrating data from public sources: A financial case study." IEEE Data Eng. Bull , 2011. http://ssrn.com/abstract=2666384 .
  13. Burdick, D., A. Evfimievski, R. Krishnamurthy, N. Lewis, L. Popa, S. Rickards, P. Williams. Financial Analytics from Public Data. International Workshop on Data Science and Macro-Modeling ( DSMM'14 ), in conjunction with ACM SIGMOD 2014.
  14. Burdick, D., L. Popa and R. Krishnamurthy. Towards High-Precision and Reusable Entity Resolution Algorithms over Sparse Financial Datasets. International Workshop on Data Science and Macro-Modeling ( DSMM'16 ), in conjunction with ACM SIGMOD 2016.
  15. Burdick, D. R. Fagin, P. G. Kolaitis, L. Popa, W.-C. Tan. A declarative framework for linking entities". Internal Conference on Database Theory ( ICDT ) 2015, 25-43. (Best Paper Award. Extended version invited to appear in ACM TODS , July 2016).
  16. Butaru, F., Q. Chen, B. Clark, S. Das, A.W. Lo, and A. Siddique, (2016). "Risk and Risk Management in the Credit Card Industry," Journal of Banking and Finance , 72:218--239. http://www.sciencedirect.com/science/article/pii/S0378426616301340
  17. Choudhry, Bhagwan, Sanjiv Das, and Barney Hartman-Glaser (2016). "How Big Data Can Make Us Less Racist,'' Zocala Public Square , April 28, 2016.
  18. Cesa-Bianchi, N. and G. Lugosi (2006). Prediction, learning, and games . Cambridge University Press.
  19. Dantzig, G., DeHaven, J.C., Cooper, I., Johnson, S.M., DeLand, E.C., Kanter, H.E., Sans, C.F., (1959). "A Mathematical Model of the Human External Respiratory System," RAND Corporation , RM-2519.
  20. Das, Sanjiv (2014). "Text and Context: Language Analytics for Finance,'' Foundations and Trends in Finance , v8(3), 145--260.
  21. Das, Sanjiv (2016). "Matrix Metrics: Network-Based Systemic Risk Scoring", Journal of Alternative Investments , Special Issue on Systemic Risk, v18(4), 33-51.
  22. Das, Sanjiv., Seoyoung Kim, and Daniel Ostrov (2017). "Dynamic Risk Networks: A Note," Working paper, Santa Clara University.
  23. Dasu, T. and T. Johnson (2003). Exploratory Data Mining and Data Cleaning, Wiley-Interscience.
  24. DelSole, T., Monteleoni, C., McQuade, S., Tippett, M. K., Pegion, K. and Shukla, J. (2015). "Tracking seasonal prediction models." In: Proceedings of the Fifth International Workshop on Climate Informatics .
  25. Dhar, V., (2013). "Data Science and Prediction," Communications of the ACM , 56(12), December.
  26. Dhar, V., (2016). "When to Trust Robots with Decisions, and When Not To," Harvard Business Review , 17 May.
  27. Domingos, P. (2012). "A Few Useful Things to Know about Machine Learning," Communications of the ACM , 55(10), October, 78-87.
  28. Dong, X. L. and Srivastava, D. (2013). "Big data integration," In: 29th International Conference on Data Engineering (ICDE), 1245-1248.
  29. Donoho, D. L. and Stodden, V. C. (2006). "Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations," IJCNN '06. International Joint Conference on Neural Networks, 2006, http://academiccommons.columbia.edu/item/ac:140168 .
  30. Evans, M. D. D. (2005). "Where Are We Now? Real-Time Estimates of the Macroeconomy,'' International Journal of Central Banking , v1(2).
  31. Espinosa-Vega, Marco A., and Juan Sola (2010). "Cross-Border Financial Surveillance: A Network Perspective,'' IMF Working paper no 10/105;
  32. Espinosa, Marco (2010). "Systemic Risk and the Redesign of Financial Regulation,'' Global Financial Stability Report , IMF, Chapter 2.
  33. Fan, J., Han, F. and Liu, H. (2014). "Challenges of Big Data Analysis," National Science Review , 1(2), June, 293-314.
  34. Faulkner, A., (2015). "ThreatMetrix Q4 2015 Cybercrime Report," ThreatMetrix , San Jose, CA.
  35. Flood, M. D., J. C. Liechty and T. Piontek (2015). "Systemwide commonalities in market liquidity." Office of Financial Research : Working Paper 15-11.
  36. Flood, M. D., H. V. Jagadish, Albert Kyle, Frank Olken and Louiqa Raschid (2011). "Using Data for Systemic Financial Risk Management." CIDR , 144-147.
  37. Flood, M. D., H. V. Jagadish, and L. Rashid (2016). "Big Data Challenges and Opportunities in Financial Stability Monitoring," Financial Stability Review (of the Banque de France), v20, 129-142.
  38. Ghent, A., R. Hernandez-Murillo, and M. Owyang (2014). " Differences in Subprime Loan Pricing Across Races and Neighborhoods ," Regional Science and Urban Economics , 48, 199-215.
  39. Giannone, D., L. Reichlin, D. Small (2008). "Nowcasting: The real-time informational content of macroeconomic data,'' Journal of Monetary Economics , 55(4), 665-676. Global Legal Entity Identifier Foundation (2014). "Annual Report 2014," https://www.gleif.org/en/about/governance/annual-report# .
  40. Green, T., G. Karvounarakis and V. Tannen (2007a). "Provenance Semirings," Proceedings of the 26 th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2007.
  41. Green, T., G. Karvounarakis, Z.G. Ives and V. Tannen (2007b). "Update Exchange with Mappings and Provenance," Proceedings of the International Conference on Very Large Data Bases (VLDB) 2007.
  42. Halevy, A., A. Rajaraman and J. Ordille (2006). "Data integration: the teenage years," Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06), 9-16.
  43. Herbster M and M. K. Warmuth. (1998). "Tracking the best expert." Machine Learning , 32:151-178.
  44. M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, R. Wisnesky. HIL: A High-Level Scripting Language for Entity Integration. EDBT 2013: 549-560.
  45. M. A. Hernández, K. Hildrum, P. Jain, R. Wagle, B. Alexe, R. Krishnamurthy, I. R. Stanoi, C. Venkatramani. Constructing Consumer Profiles from Social Media Data. IEEE BigData Conference 2013: 710-716.
  46. Higgins, P., (2014). "GDPNow: A Model for GDP Nowcasting", Federal Reserve Bank of Atlanta, Working Paper 2014-7.
  47. Hunter, M. (2014). "Statement by Maryann F. Hunter, Deputy Director, Division of Banking Supervision and Regulation, Board of Governors of the Federal Reserve System before the Committee on Banking, Housing, and Urban Affairs, U.S. Senate, Washington, D.C.," http://www.federalreserve.gov/newsevents/testimony/hunter20140916a.pdf .
  48. Jagadish, H. V., J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan and C. Shahabi (2014). "Big Data and Its Technical Challenges," Communications of the ACM , 57(7), July, 86-94.
  49. Jegadeesh N and D. Wu (2013). "Word power: A new approach for content analysis,'' Journal of Financial Economics , 110(3), 712--729.
  50. Karvounarakis, G., Z.G. Ives and V. Tannen (2010). "Querying Data Provenance," Proceedings of the 2010 ACM SIGMOD Conference on Management of Data .
  51. Leskovec, J. (2011). "Social media analytics: tracking, modeling and predicting the flow of information through networks." In Proceedings of the 20th international conference companion on World wide web ( WWW '11). ACM, New York, NY, USA, 277-278. DOI= http://dx.doi.org/10.1145/1963192.1963309 .
  52. Lim, T. S., W. Y. Loh, and Y. S. Shih, (2000). "A comparison of prediction accuracy, complexity, and training time for thirty-three old and new classification algorithms," Machine Learning , 40:203--228.
  53. Lin M., N. R. Prabhala and S. Viswanathan (2013). "Judging borrowers by the company they keep: Friendship networks and information asymmetry in online peer-to-peer lending.'' Management Science , 59(1), 17--35.
  54. Littlestone, N. and M. K. Warmuth (1989). "The weighted majority algorithm." Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 256-261.
  55. Loughran, T. and W. McDonald W (2014). "Measuring readability in financial disclosures,'' Journal of Finance , 69, 1643--1671.
  56. McQuade, S. and C. Monteleoni (2012). "Global climate model tracking using geospatial neighborhoods." Proceedings of AAAI , pages 335-341.
  57. McQuade, S. and C. Monteleoni (2013). "MRF-based spatial expert tracking of the 2010 ACM SIGMOD Conference multi-model ensemble." New Approaches for Pattern Recognition and Change Detection , session at American Geophysical Union (AGU) Fall Meeting. McQuade, S. and C. Monteleoni (2015). "Multi-task learning from a single task: can different forecast periods be used to improve each other?" Proceedings of the Fifth International Workshop on Management of Climate Informatics .
  58. McQuade, S. and C. Monteleoni (2016). "Online Learning of Volatility from Multiple Option Term Lengths." DSMM'16: Proceedings of the Second International Workshop on Data Science for Macro-Modeling . Article No. doi: 10.1145/2951894.2951902
  59. Merton, Robert. C., Monica Billio, Mila Getmansky, Dale Gray, Andrew Lo, and Loriana Pelizzon (2013). "On a New Approach for Analyzing and Managing Macrofinancial Risks," Financial Analysts Journal , 69(2), 22-33.
  60. Monteleoni, C., G. A.Schmidt, S. Saroha and E. Asplund (2011). "Tracking climate models." Statistical Analysis and Data Mining : Special Issue: Best of CIDU 2010, 4(4): 72-392.
  61. Munyan, B. (2014). "Regulatory Arbitrage in Repo Markets," working paper, December, http://www.bmunyan.com/ .
  62. National Consumer Law Center (2014). "Big Data: A Big Disappointment for Scoring Consumer Credit Risk". https://www.nclc.org/images/pdf/pr-reports/big-data-study.pdf National Institute of Standards and Technology (2016). "Financial Entity Identification and Information Integration (FEIII) Challenge: About the challenge," web page, https://ir.nist.gov/dsfin/about.html . Office of Financial Research (2015). "Financial Stability Report," December, https://financialresearch.gov/financial-stability-reports/ .
  63. O'Neill, Cathy (2016). "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy," Crown Publishing Group , New York.
  64. Osborne, J. W. (2012). "Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data," SAGE Publications .
  65. Perlich, Claudia., Foster Provost, and Jeffrey S. Simonoff (2003). "Tree Induction vs. Logistic Regression: A Learning-Curve Analysis," Journal of Machine Learning Research , 4:211--255.
  66. Philippon, T. and A. Reshef (2013). "An International Look at the Growth of Modern Finance," Journal of Economic Perspectives , 27(2), 73-96.
  67. Philippon, T. (2015). "Has the us finance industry become less efficient? on the theory and measurement of financial intermediation?" The American Economic Review , 105(4), 1408-38.
  68. Philippon, T. (2016). "The FinTech Opportunity," Working paper, NYU.
  69. Pipino, L. L., Y. W. Lee and R. Y. Wang (2002). "Data quality assessment," Communications of the ACM , 45(4), 211-218.
  70. Rahm, E. and P. A. Bernstein (2001). "A survey of approaches to automatic schema matching," VLDB Journal , 10(4), December, 334-350.
  71. Rahm, E. and H. H. Do (2000). "Data cleaning: Problems and current approaches," IEEE Data Engineering Bulletin , 23(4), 3-13.
  72. Rosenthal, A. and L. Seligman (2011). "Data integration for systemic risk in the financial system," Chapter 4, Handbook for Systemic Risk , Fouque, J.-P. & Langsam, J. A. (Eds.), Cambridge University Press, 93-122.
  73. Sala-i-Martin, X. X. (1997). "I Just Ran Two Million Regressions," American Economic Review , 87(2), 178-183.
  74. Srinivasan, S., (2016). "Using Big Data to Detect Financial Fraud Aided by FinTech Methods," Working paper, Texas Southern University.
  75. Strobach E. and G. Bel (2015). "Improvement of climate predictions and reduction of their uncertainties using learning algorithms." Atmospheric Chemistry and Physics , 15(15):8631-8641.
  76. Stein, R. M., (2013). "Aligning models and data for systemic risk analysis," in The Handbook of Systemic Risk . Oxford University Press.
  77. Strobach E. and G. Bel (2016). "Decadal climate predictions using sequential learning algorithms." Journal of Climate, 29(10):3787-3809.
  78. Talukdar, P. P., Z. G. Ives and F. Pereira (2010). "Automatically Incorporating New Sources in Keyword Search-Based Data Integration," Proceedings of the 2010 ACM SIGMOD Conference on Management of Data .
  79. Wallace, N., (2011). "Real Estate Price Measurement and Stability Crises," Working paper, UC Berkeley.
  80. Wei Y, Pinar Yildirim, Christophe Van den Bulte, Chrysanthos Dellarocas (2015). "Credit Scoring with Social Data," Marketing Science , October, 1--25.
  81. Xu, Z., D. Burdick and L. Raschid (2016). "Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents," working paper, forthcoming.