Abstract
The dataset contains quality, source code metrics information of 60 versions under 10 different repositories. The dataset is extracted into 3 levels: (1) Class (2) Method (3) Package. The dataset is created upon analyzing 9,420,246 lines of code and 173,237 classes. The provided dataset contains one quality_attributes folder and three associated files: repositories.csv, versions.csv, and attribute-details.csv. The first file (repositories.csv) contains general information(repository name, repository URL, number of commits, stars, forks, etc) in order to understand the size, popularity, and maintainability. File versions.csv contains general information (version unique ID, number of classes, packages, external classes, external packages, version repository link) to provide an overview of versions and how overtime the repository continues to grow. File attribute-details.csv contains detailed information (attribute name, attribute short form, category, and description) about extracted ...
References (16)
- NSF (2022). Harnessing the Data Revolution, National Science Foundation's 10 Big Ideas, https://www.nsf.gov/news/special_reports/big_ideas/harnessing.jsp.
- Stella, F. (2021). Machine Learning Accuracy and Big Data in Research on Disease and Health. Current Genomics, 22(4), 237.
- Dash, S., Shakyawar, S. K., Sharma, M., & Kaushik, S. (2019). Big data in healthcare: management, analysis and future prospects. Journal of Big Data, 6(1), 1-25.
- Saunders, G. H., Christensen, J. H., Gutenberg, J., Pontoppidan, N. H., Smith, A., Spanoudakis, G., & Bamiou, D. E. (2020). Application of big data to support evidence-based public health policy decision-making for hearing. Ear and hearing, 41(5), 1057.
- Auld, G., Bernstein, S., Cashore, B., & Levin, K. (2021). Managing pandemics as super wicked problems: lessons from, and for, COVID-19 and the climate crisis. Policy sciences, 54(4), 707-728.
- Wu, Y., Xie, L., Huang, S. L., Li, P., Yuan, Z., & Liu, W. (2018). Using social media to strengthen public awareness of wildlife conservation. Ocean & Coastal Management, 153, 76-83.
- Adams, J. C. (2020, February). Creating a balanced data science program. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 185-191).
- Berinato, S. (2019). Data science and the art of persuasion. Harvard Business Review, 97(1), 126-137.
- Harris, K., Sithole, A., & Kibirige, J. (2017). A Needs Assessment for the Adoption of Next Generation Science Standards (NGSS) in K-12 Education in the United States. Journal of Education and Training Studies, 5(9), 54-62.
- Lee, V. R., Wilkerson, M. H., & Lanouette, K. (2021). A Call for a Humanistic Stance Toward K-12 Data Science Education. Educational Researcher, 50(9), 664-672.
- Hendrickson, K., Gauthier, L. Osorio Glennon, M., Menocal Harrigan, A., Weissman, H., Fletcher, C., Dunton, S., Baskin, J., & Mak, J. (2021). The 2021 State of Computer Science Education: Accelerating Action Through Advocacy.
- Walker, J. T. (2021). Middle School Student Knowledge of and Attitudes Toward Synthetic Biology. Journal of Science Education and Technology, 30(6), 791-802.
- Borrero, N., Ziauddin, A., & Ahn, A. (2018). Teaching for Change: New Teachers' Experiences with and Visions for Culturally Relevant Pedagogy. Critical Questions in Education, 9(1), 22-39.
- Kafai, Y. B., & Proctor, C. (2022). A Revaluation of Computational Thinking in K-12 Education: Moving Toward Computational Literacies. Educational Researcher, 51(2), 146-151.
- Ladson-Billings, G. (1995). Toward a theory of culturally relevant pedagogy. American educational research journal, 32(3), 465-491.
- Reza, S. M., Badreddin, O., Walker J. T. (2022). TweetDrill: An application to mine tweets from Twitter. https://www.smreza.com/projects/twitter/.