Using Data Mining to Teach Applied Statistics and Correlation

Jessica L. Hartnett

doi:10.1177/0098628316636292

Outline

Title

Abstract

Data Mining: Teaching Technique

Data Mining: Teaching Ethics

Study 1 Method Participants

Procedure

Pre-Posttest Data

Study 2 Method Participants

Results and Discussion

Science and Mathematics Education

Using Data Mining to Teach Applied Statistics and Correlation

Jessica Hartnett

https://doi.org/10.1177/0098628316636292

visibility

…

description

5 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for the Social Sciences (SPSS, Version 20).

Katerina Marcoulides

downloadDownload free PDF View PDFchevron_right

Data Science in the Statistics Curricula: Preparing Students to "Think with Data

Nicholas Horton

2014

A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to utilize databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this paper is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of undergraduates with data and data science.

downloadDownload free PDF View PDFchevron_right

Data Science in Statistics Curricula: Preparing Students to “Think with Data”

Nicholas Horton

The American Statistician, 2015

downloadDownload free PDF View PDFchevron_right

Data mining in education

eren aksoy

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2013

Data mining is the process of discovering "hidden messages," patterns and knowledge within large amounts of data and of making predictions for outcomes or behaviors. This chapter discusses in detail the theoretical and practical aspects of data mining and provides a case study of its application to college transfer data.

downloadDownload free PDF View PDFchevron_right

Data mining and education

Zachary Pardos

Wiley interdisciplinary reviews. Cognitive science, 2015

An emerging field of educational data mining (EDM) is building on and contributing to a wide variety of disciplines through analysis of data coming from various educational technologies. EDM researchers are addressing questions of cognition, metacognition, motivation, affect, language, social discourse, etc. using data from intelligent tutoring systems, massive open online courses, educational games and simulations, and discussion forums. The data include detailed action and timing logs of student interactions in user interfaces such as graded responses to questions or essays, steps in rich problem solving environments, games or simulations, discussion forum posts, or chat dialogs. They might also include external sensors such as eye tracking, facial expression, body movement, etc. We review how EDM has addressed the research questions that surround the psychology of learning with an emphasis on assessment, transfer of learning and model discovery, the role of affect, motivation and...

downloadDownload free PDF View PDFchevron_right

Teaching Data Mining in the Era of Big Data

Ashwin Satyanarayana

The amount of data being generated and stored is growing exponentially, owed in part to the continuing advances in computer technology. These data present tremendous opportunities in data mining, a burgeoning field in computer science that focuses on the development of methods that can extract knowledge from data. Recent studies have noted the rise of data mining as a career path with increasing opportunities for graduates. These opportunities are not only available in the private sector; the U.S. government has recently invested $200 million in “big data” research. These suggest the importance for us to teach the tools and techniques that are used in this field. Data mining introduces new challenges for faculty in universities who teach courses in this area. Some of these challenges include: providing access to large real world data for students, selection of tools and languages used to learn data mining tasks, and reducing the vast pool of topics in data mining to those that are c...

downloadDownload free PDF View PDFchevron_right

A Guide to Teaching Data Science

koby mike

The American Statistician, 2018

Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking , as defined by Wild and Pfannkuck (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating , connecting , and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.

downloadDownload free PDF View PDFchevron_right

Data Mining Curriculum: A Proposal (Version 1.0) Intensive Working Group of ACM SIGKDD Curriculum Committee

Dennys Prasetya

downloadDownload free PDF View PDFchevron_right

Data Mining and Statistics: What's the Connection

Anh Vũ Ngọc

Data Mining is used to discover patterns and relationships in data, with an emphasis on large observational data bases. It sits at the common frontiers of several elds including Data Base Management, Artiicial Intelligence , Machine Learning, Pattern Recognition, and Data Visualization. From a statistical perspective it can be viewed as computer automated exploratory data analysis of (usually) large complex data sets. In spite of (or perhaps because of) the somewhat exaggerated hype, this eld is having a major impact in business, industry, and science. It also aaords enormous research opportunities for new methodological developments. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in elds other than Statistics. This paper explores some of the reasons for this, and why statisticians should have an interest in Data Mining. It is argued that Statistics can potentially have a major innuence on Data Mining, but in order to do so some of our basic paradigms and operating principles mayhave to be modiied.

downloadDownload free PDF View PDFchevron_right

Teaching precursors to data science in introductory and second courses in statistics

Nicholas Horton

2014

Statistics students need to develop the capacity to make sense of the staggering amount of information collected in our increasingly data-centered world. Data science is an important part of modern statistics, but our introductory and second statistics courses often neglect this fact. This paper discusses ways to provide a practical foundation for students to learn to "compute with data" as defined by Nolan and Temple Lang (2010), as well as develop "data habits of mind" (Finzer, 2013). We describe how introductory and second courses can integrate two key precursors to data science: the use of reproducible analysis tools and access to large databases. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically in the era of big data.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

siva ganesh

2002

Teaching of statistics involves developing and adapting robust procedures for understanding statistical concepts, and for the management and analysis of statistical data. The field of statistics is constantly challenged problems that arise from science, industry and business. Traditionally, the statistics curriculum deals with data often collected to answer specific questions. However, in the modern 'information' age, vast amounts of data are collected, often automatically, with the advent of powerful computers. Data Mining is the process of extracting knowledge from large volumes of data. Since 'computation' plays a major role in this process, computer scientists have a significant claim over the ownership of data mining. Nevertheless, data mining techniques, in general, have a statistical base; and statisticians are beginning to show a significant interest in the area, including offering tertiary courses in 'statistical' data mining.

downloadDownload free PDF View PDFchevron_right

A sample study on applying data mining research techniques in educational science: Developing a more meaning of data

Ergin Erginer

Procedia - Social and Behavioral Sciences, 2011

The purpose of this research is to present a sample study analyzing data gathered from an educational study using data mining techniques appropriate for processing these data. In order to achieve this aim, a “Computer Self-efficiency Scale” used in educational sciences was selected and this scale was applied in a study group. Data was analyzed using descriptive statistics (ttest and analysis of variance), and the data mining techniques of decision tree, dependency networks and clustering. The descriptive statistics used were calculated not using common statistical software packages, but by running a program written in Delphi 2009 programming language on Microsoft SQL Server 2008. Microsoft SQL Server 2008 was directly used for the data mining techniques of dependency networks and clustering. Some of the findings of the research, which cannot be obtained by common statistical techniques but can be obtained by data mining methods, were as follows: “those who think they are competent with computer terms and concepts believe they have a special talent in using computers”; “those who believe they have a special talent in using computers feel as if the computer is part of their body”, and “students who have been using computers for more than six years believe they have a special talent in using computers”.

downloadDownload free PDF View PDFchevron_right

A teaching experience on a data mining module

Francesco Maiorana

2012 Federated Conference on Computer Science and Information Systems (FedCSIS), 2012

Data mining is recognized as an important field where one has the possibility to become accustomed both with analysis techniques and methods and with a state of mind. By means of data mining it is possible to develop critical skills that are essential in today's information technology. We present our experience in teaching a data mining module, within an Information System course, centered around a few key aspects: a convergence of theoretical Information Systems aspects and computing skills through programming a complete data mining analysis in Matlab; a project centered learning experience; a sharing of resources that are commented on both by the teacher and by peers facilitating the flow of information and the development of critical skills; a guided inquiry process where the students, when needed, are guided through appropriate questions in the right direction; and finally special attention to requiring motivation of each decision and step undertaken. As a case study we pres...

downloadDownload free PDF View PDFchevron_right

A Data Science Course for Undergraduates: Thinking With Data

Benjamin Baumer

The American Statistician, 2015

Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.

downloadDownload free PDF View PDFchevron_right

Supporting Data Science in the Statistics Curriculum

Shonda Kuiper

Journal of Statistics Education

This article describes a collaborative project across three institutions to develop, implement, and evaluate a series of tutorials and case studies that highlight fundamental tools of data science-such as visualization, data manipulation, and database usage-that instructors at a wide-range of institutions can incorporate into existing statistics courses. The resulting materials are flexible enough to serve both introductory and advanced students, and aim to provide students with the skills to experiment with data, find their own patterns, and ask their own questions. In this article, we discuss a tutorial on data visualization and a case study synthesizing data wrangling and visualization skills in detail, and provide references to additional class-tested materials. R and R Markdown are used for all of the activities.

downloadDownload free PDF View PDFchevron_right

Developments in Educational Data mining Introduction to Data mining

International Research Journal Commerce arts science

Educational Data Mining (EDM) is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to understand the students better. In this paper, we studied the developments in the field of Educational Data Mining.

downloadDownload free PDF View PDFchevron_right

Guide to Teaching Data Science

koby mike

Springer eBooks, 2023

Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for

downloadDownload free PDF View PDFchevron_right

A case study report on integrating statistics, problem-based learning, and computerized data analysis

Thomas Hewett

Behavior Research Methods, Instruments, & Computers, 1999

This paper addresses the pedagogical advantages of teaching statistics not as a stand-alone subject in itself, but rather as a topic integrated into teaching hands-on, problem-based computer-assisted data analysis. For over 10years, such a two-term course has been taught at Drexel University in lieu of the usual statistics courses formerly taken by undergraduate majors in psychology and sociology. One virtue of the courses as currently implemented is that students seem to learn not just how to perform statistical procedures but how to apply them on their own. How should statistics be taught? On the basis of a comprehensive review of educational literature on the teaching of statistics, has suggested that the standard classroom approach, based on textbook and lectures, is not the most effective.On the basis ofGarfield's review, the things that do seem to help statistics students are activity-based courses, the use of small group learning experiences, substantial frequent feedback, and the use of software that allows students to interact with real data. Garfield ends her review with a call for statistics teachers to somehow implement these principles.

downloadDownload free PDF View PDFchevron_right

Cited by

Use of Peer Mentoring, Interdisciplinary Collaboration, and Archival Datasets for Engaging Undergraduates in Publishable Research

Jonathan J Hammersley

Frontiers in Psychology

We agree wholeheartedly with Dr. Sharon Brehm, the 2007 President of the American Psychological Association, who stated: "I believe that undergraduate research is one of the three most valuable experiences that colleges and universities can offer their undergraduate students (Keynote Address, 24th Annual Mid-America Undergraduate Psychology Research Conference)." We would add that engaging in undergraduate research can be enjoyable and rewarding, for students as well as their faculty mentors. There is nothing quite like observing students becoming interested and engaged in research, planning and carrying their own projects, getting excited to analyze their data, and then experiencing the pride of presenting or publishing their project. This is perhaps one of the best aspects of being a psychology faculty member.

downloadDownload free PDF View PDFchevron_right

Using Data Mining to Teach Applied Statistics and Correlation

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics

Cited by