Using Data Mining to Teach Applied Statistics and Correlation
https://doi.org/10.1177/0098628316636292…
5 pages
1 file
Sign up for access to the world's latest research
Abstract
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for the Social Sciences (SPSS, Version 20).
Related papers
2014
A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to utilize databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this paper is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of undergraduates with data and data science.
The American Statistician, 2015
A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to utilize databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this paper is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engaging of undergraduates with data and data science.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2013
Data mining is the process of discovering "hidden messages," patterns and knowledge within large amounts of data and of making predictions for outcomes or behaviors. This chapter discusses in detail the theoretical and practical aspects of data mining and provides a case study of its application to college transfer data.
Wiley interdisciplinary reviews. Cognitive science, 2015
An emerging field of educational data mining (EDM) is building on and contributing to a wide variety of disciplines through analysis of data coming from various educational technologies. EDM researchers are addressing questions of cognition, metacognition, motivation, affect, language, social discourse, etc. using data from intelligent tutoring systems, massive open online courses, educational games and simulations, and discussion forums. The data include detailed action and timing logs of student interactions in user interfaces such as graded responses to questions or essays, steps in rich problem solving environments, games or simulations, discussion forum posts, or chat dialogs. They might also include external sensors such as eye tracking, facial expression, body movement, etc. We review how EDM has addressed the research questions that surround the psychology of learning with an emphasis on assessment, transfer of learning and model discovery, the role of affect, motivation and...
The amount of data being generated and stored is growing exponentially, owed in part to the continuing advances in computer technology. These data present tremendous opportunities in data mining, a burgeoning field in computer science that focuses on the development of methods that can extract knowledge from data. Recent studies have noted the rise of data mining as a career path with increasing opportunities for graduates. These opportunities are not only available in the private sector; the U.S. government has recently invested $200 million in “big data” research. These suggest the importance for us to teach the tools and techniques that are used in this field. Data mining introduces new challenges for faculty in universities who teach courses in this area. Some of these challenges include: providing access to large real world data for students, selection of tools and languages used to learn data mining tasks, and reducing the vast pool of topics in data mining to those that are c...
The American Statistician, 2018
Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking , as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating , connecting , and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.
Data Mining is used to discover patterns and relationships in data, with an emphasis on large observational data bases. It sits at the common frontiers of several elds including Data Base Management, Artiicial Intelligence , Machine Learning, Pattern Recognition, and Data Visualization. From a statistical perspective it can be viewed as computer automated exploratory data analysis of (usually) large complex data sets. In spite of (or perhaps because of) the somewhat exaggerated hype, this eld is having a major impact in business, industry, and science. It also aaords enormous research opportunities for new methodological developments. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in elds other than Statistics. This paper explores some of the reasons for this, and why statisticians should have an interest in Data Mining. It is argued that Statistics can potentially have a major innuence on Data Mining, but in order to do so some of our basic paradigms and operating principles mayhave to be modiied.
2014
Statistics students need to develop the capacity to make sense of the staggering amount of information collected in our increasingly data-centered world. Data science is an important part of modern statistics, but our introductory and second statistics courses often neglect this fact. This paper discusses ways to provide a practical foundation for students to learn to "compute with data" as defined by Nolan and Temple Lang (2010), as well as develop "data habits of mind" (Finzer, 2013). We describe how introductory and second courses can integrate two key precursors to data science: the use of reproducible analysis tools and access to large databases. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically in the era of big data.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.