A bibliometric approach to tracking big data research trends
2017, Journal of Big Data
https://doi.org/10.1186/S40537-017-0088-1Abstract
The explosive growing number of data from mobile devices, social media, Internet of Things and other applications has highlighted the emergence of big data. This paper aims to determine the worldwide research trends on the field of big data and its most relevant research areas. A bibliometric approach was performed to analyse a total of 6572 papers including 28 highly cited papers and only papers that were published in the Web of Science TM Core Collection database from 1980 to 19 March 2015 were selected. The results were refined by all relevant Web of Science categories to computer science, and then the bibliometric information for all the papers was obtained. Microsoft Excel version 2013 was used for analyzing the general concentration, dispersion and movement of the pool of data from the papers. The t test and ANOVA were used to prove the hypothesis statistically and characterize the relationship among the variables. A comprehensive analysis of the publication trends is provided by document type and language, year of publication, contribution of countries, analysis of journals, analysis of research areas, analysis of web of science categories, analysis of authors, analysis of author keyword and keyword plus. In addition, the novelty of this study is that it provides a formula from multi-regression analysis for citation analysis based on the number of authors, number of pages and number of references.
References (86)
- Wu X, et al. Data mining with big data. Knowl Data Eng IEEE Trans. 2014;26(1):97-107.
- Banks R. There are now 3 billion Internet users worldwide in 2015. Mobile Industry Review 2015; http://www. mobileindustryreview.com/2015/01/3-billion-internet-users-2015.html.
- Hashem IAT, et al. The rise of "big data" on cloud computing: review and open research issues. Info Syst. 2015;47:98-115.
- Diaz M. et al. Big data on the internet of things. In 2012 sixth international conference on innovative mobile and internet services in ubiquitous computing. 2012.
- Khan M, Uddin MF, Gupta N. Seven V's of big data understanding big data to extract value. In American Society for engineering education (ASEE Zone 1), 2014 zone 1 conference of the 2014. IEEE.
- Chen M, Mao S, Liu Y. Big data: a survey. Mob Netw Appl. 2014;19(2):171-209.
- Menacer M, Menacer A, Arbaoui A. Islamic resources big data mining, extraction and archiving. Enhanc Res Manag Comput Appl. 2014;3(12):20-5.
- Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309(13):1351-2.
- Michael K, Miller KW. Big data: new opportunities and new challenges [guest editors' introduction]. Computer. 2013;46(6):22-4.
- Xiang Z, et al. What can big data and text analytics tell us about hotel guest experience and satisfaction? Int J Hosp Manag. 2015;44:120-30.
- Gani A, et al. A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst. 2016;46(2):241-84.
- Drake M. Encyclopedia of library and information science, vol. 1. USA: CRC Press; 2003.
- Wildgaard L. A comparison of 17 author-level bibliometric indicators for researchers in Astronomy, environmental science, philosophy and public health in web of science and google scholar. Scientometrics. 2015;104(3):1-34.
- Garfield E. Citation indexes for science: a new dimension in documentation through association of ideas. Science. 1955;122(3159):108-11.
- Ho Y-S. The top-cited research works in the science citation index expanded. Scientometrics. 2013;94(3):1297-312.
- Garfield E. Science citation index-a new dimension in indexing. Science. 1964;144(3619):649-54.
- Repanovici A. Measuring the visibility of the university's scientific production using google scholar, Publish or Perish software and Scientometrics. In: World library and information congress: 76th ifla general conference and assembly. Gothenburg; 2010. (10-15 August 2010)
- Zitt M, Ramanana-Rahary S, Bassecoulard E. Relativity of citation performance and excellence measures: from cross- field to cross-scale effects of field-normalisation. Scientometrics. 2005;63(2):373-401.
- Li LL, et al. Global stem cell research trend: bibliometric analysis as a tool for mapping of trends from 1991 to 2006. Scientometrics. 2009;80(1):39-58.
- Ale Ebrahim N, et al. Visibility and citation impact. Int Educ Stud. 2014;7(4):120-5.
- Budd JM. A bibliometric analysis of higher-education literature. Res High Educ. 1988;28(2):180-90.
- Canas-Guerrero I, et al. Bibliometric analysis in the international context of the "Construction & Building Technology" category from the web of science database. Constr Build Mater. 2014;53:13-25.
- Canas-Guerrero I, et al. Bibliometric analysis of research activity in the "Agronomy" category from the web of science, 1997-2011. Eur J Agron. 2013;50:19-28.
- Ingwersen P. The international visibility and citation impact of Scandinavian research articles in selected social sci- ence fields: the decay of a myth. Scientometrics. 2000;49(1):39-61.
- Wohlin C. An analysis of the most cited articles in software engineering journals-1999. Inf Softw Technol. 2005;47(15):957-64.
- Fardi A, et al. Top-cited articles in endodontic journals. J Endod. 2011;37(9):1183-90.
- Shadgan B, et al. Top-cited articles in rehabilitation. Arch Phys Med Rehabil. 2010;91(5):806-15.
- Fooladi M, et al. Do criticisms overcome the praises of journal impact factor? Asian Soc Sci. 2013;9(5):176-82.
- Ale Ebrahim N, et al. Equality of google scholar with web of science citations: case of Malaysian engineering highly cited papers. Mod Appl Sci. 2014;8(5):63-9.
- Gomez-Jauregui V, et al. Information management and improvement of citation indices. Int J Inf Manage. 2014;34(2):257-71.
- Daim TU, et al. Forecasting emerging technologies: use of bibliometrics and patent analysis. Technol Forecast Soc Chang. 2006;73(8):981-1012.
- Yoshikane F. Multiple regression analysis of a patent's citation frequency and quantitative characteristics: the case of Japanese patents. Scientometrics. 2013;96(1):365-79.
- Leydesdorff L, Rotolo D, Rafols I. Bibliometric perspectives on medical innovation using the medical subject head- ings of PubMed. J Assoc Inf Sci Technol. 2012;63(11):2239-53.
- Bornmann L, Wagner C, Leydesdorff L. BRICS countries and scientific excellence: a bibliometric analysis of most frequently cited papers. J Assoc Inf Sci Technol. 2015;66(7):1507-13.
- Kozak M, Bornmann L, Leydesdorff L. How have the Eastern European countries of the former Warsaw Pact devel- oped since 1990? A bibliometric study. Scientometrics. 2015;102(2):1101-17.
- Zhou P, Leydesdorff L. Chemistry in China-A bibliometric view. Chim Oggi Chem Today. 2009;27(6):19-22.
- Abramo G, D' Angelo CA. The relationship between the number of authors of a publication, its citations and the impact factor of the publishing journal: evidence from Italy. J Informetr. 2015;9(4):746-61.
- Fox CW, Paine CE, Sauterey B. Citations increase with manuscript length, author number, and references cited in ecology journals. Ecol Evol. 2016;6(21):7717-26.
- Bornmann L, Mutz R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inf Sci Technol. 2015;66(11):2215-22.
- Huang M-H, Chang H-W, Chen D-Z. Research evaluation of research-oriented universities in Taiwan from 1993 to 2003. Scientometrics. 2006;67(3):419-35.
- ESI. 2015. Web of science core collection help essential science indicators highly cited papers. http://images. webofknowledge.com/WOKRS517B4/help/WOS/hs_citation_applications.html#dsy7851-TRS_highly_cited_papers. Accessed on 2 June 2015
- Bornmann L, et al. A multilevel modelling approach to investigating the predictive validity of editorial decisions: do the editors of a high profile journal select manuscripts that are highly cited after publication? J R Stat Soc. 2011;174(4):857-79.
- Fu H-Z, et al. Characteristics of research in China assessed with essential science indicators. Scientometrics. 2011;88(3):841-62.
- Chuang KY, Wang MH, Ho YS. High-impact papers presented in the subject category of water resources in the essential science indicators database of the institute for scientific information. Scientometrics. 2011;87(3):551-62.
- Ho JC, et al. Technological barriers and research trends in fuel cell technologies: a citation network analysis. Technol Forecast Soc Chang. 2014;82:66-79.
- Adams J. Early citation counts correlate with accumulated impact. Scientometrics. 2005;63(3):567-81.
- UZUN A. Statistical relationship of some basic bibliometric indicators in scientometrics research. In: International workshop on webometrics, informetrics and scientometrics & seventh COLLNET meeting. France: Nancy; 2006. p. 5.
- StatPlanet Plus. http://www.statsilk.com/software/statplanet. Accessed April 2015
- Kambatla K, et al. Trends in big data analytics. J Parallel Distrib Comput. 2014;74(7):2561-73.
- Zhang J, et al. A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. Int J Approx Reason. 2014;55(3):896-907.
- Zhang X, et al. A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. Parallel Distrib Syst IEEE Trans. 2014;25(2):363-73.
- Balahur A, Turchi M. Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput Speech Lang. 2014;28(1):56-75.
- Feldman R. Techniques and applications for sentiment analysis. Commun ACM. 2013;56(4):82-9.
- Cambria E, et al. New avenues in opinion mining and sentiment analysis. IEEE Intell Syst. 2013;28(2):15-21.
- Wang L, Khan SU. Review of performance metrics for green data centers: a taxonomy study. J Supercomput. 2013;63(3):639-56.
- Wang L, et al. G-Hadoop: mapReduce across distributed data centers for data-intensive computing. Future Gener Comput Syst. 2013;29(3):739-50.
- Bari MF, et al. Data center network virtualization: a survey. Commun Surv Tutor IEEE. 2013;15(2):909-28.
- Chen H, Chiang RH, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Q. 2012;36(4):1165-88.
- Beloglazov A, Abawajy J, Buyya R. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener Comput Syst. 2012;28(5):755-68.
- Kachris C, Tomkos I. A survey on optical interconnects for data centers. Commun Surv Tutor IEEE. 2012;14(4):1021-36.
- Pedregosa F, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-30.
- Taboada M, et al. Lexicon-based methods for sentiment analysis. Comput linguist. 2011;37(2):267-307.
- Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM. 2010;53(1):72-7.
- Rosten E, Porter R, Drummond T. Faster and better: a machine learning approach to corner detection. Pattern Anal Mach Intell IEEE Trans. 2010;32(1):105-19.
- Greenberg A, Hamilton JR, Jain N, Kandula S, Kim C, Lahiri P, Maltz DA, Patel P, Sengupta S. VL2: a scalable and flexible data center network. ACM SIGCOMM Comput Commun Rev. 2009;39(4):51-62 (ACM).
- García S, et al. A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput. 2009;13(10):959-77.
- Finley AO, et al. Improving the performance of predictive process modeling for large datasets. Comput Stat Data Anal. 2009;53(8):2873-84.
- Schatz MC. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009;25(11):1363-9.
- Al-Fares M, Loukissas A, Vahdat A. A scalable, commodity data center network architecture. ACM SIGCOMM Comput Commun Rev. 2008;38(4):63-74.
- Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107-13.
- Ishibuchi H, Nojima Y. Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int J Approx Reason. 2007;44(1):4-31.
- Cheng J, Baldi P. A machine learning information retrieval approach to protein fold recognition. Bioinformatics. 2006;22(12):1456-63.
- Rosten E, Drummond T. Machine learning for high-speed corner detection, in Computer Vision-ECCV 2006. 2006; 430-443.
- Lu Z, et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics. 2004;20(4):547-56.
- Garfield E. Citation indexing for studying science. Nature. 1970;227(5259):669-71.
- Qian F, et al. A bibliometric analysis of global research progress on pharmaceutical wastewater treatment during 1994-2013. Environ Earth Sci. 2015;73(9):4995-5005.
- Coats AJ. Ethical authorship and publishing. Int J Cardiol. 2009;131(2):149-50.
- Sun Y, Fu H-Z, Ho Y-S. A bibliometric analysis of global research on genome sequencing from 1991 to 2010. Afr J Biotech. 2013;12(51):7043-53.
- Garfield E. The history and meaning of the journal impact factor. JAMA. 2006;295(1):90-3.
- Eshraghi A, et al. 100 top-cited scientific papers in limb prosthetics. Biomed Eng Online. 2013;12(1):1-12.
- Li L-L, et al. Global stem cell research trend: bibliometric analysis as a tool for mapping of trends from 1991 to 2006. Scientometrics. 2009;80(1):39-58.
- Chiu W-T, Ho Y-S. Bibliometric analysis of tsunami research. Scientometrics. 2007;73(1):3-17.
- Liao J, Huang Y. Global trend in aquatic ecosystem research from 1992 to 2011. Scientometrics. 2014;98(2):1203-19.
- Landset S, et al. A survey of open source tools for machine learning with big data in the hadoop ecosystem. J Big Data. 2015;2(1):24.
- Garfield E. KeyWords plus-ISI's breakthrough retrieval method. 1. Expanding your searching power on current- contents on diskette. Curr Contents. 1990; 32:5-9.
- Dong B, et al. A bibliometric analysis of solar power research from 1991 to 2010. Scientometrics. 2012;93(3):1101-17.