An educator's perspective of the tidyverse
2021, eScholarship (California Digital Library)
https://doi.org/10.48550/ARXIV.2108.03510Abstract
Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around use of the tidyverse. The tidyverse, in the words of its developers, "is a collection of R packages that share a high-level
References (91)
- Bache, S. M. & Wickham, H. (2020), magrittr: A Forward-Pipe Operator for R. R package version 2.0.1. URL: https://CRAN.R-project.org/package=magrittr
- Bashir, S. & Eddelbuettel, D. (2018), 'Getting started in R: Tinyverse edition', tiny- verse.org. URL: https://eddelbuettel.github.io/gsir-te/Getting-Started-in-R.pdf
- Baumer, B., C ¸etinkaya-Rundel, M., Bray, A., Loi, L. & Horton, N. J. (2014), 'R mark- down: Integrating a reproducible analysis tool into introductory statistics', Technology Innovations in Statistics Education 8(1). URL: https://escholarship.org/uc/item/90b2f5xh
- Baumer, B. S., Kaplan, D. T. & Horton, N. J. (2021), Modern Data Science with R, 2nd edn, Chapman and Hall/CRC Press: Boca Raton. URL: https://www.routledge.com/Modern-Data-Science-with-R/Baumer-Kaplan- Horton/p/book/9780367191498
- BBC Visual and Data Journalism (2019), 'How the BBC visual and data journalism team works with graphics in R', Medium.com. URL: https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and- data-journalism-team-works-with-graphics-in-r-ed0b35693535
- Beckman, M. D., C ¸etinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J. & Tackett, M. (2021), 'Implementing version control with git and GitHub as a learning objective in statistics and data science courses', Journal of Statistics and Data Science Education 29(sup1), S132-S144.
- Bion, R., Chang, R. & Goodman, J. (2018), 'How R helps AirBnB make the most of its data', 72(1), 46-52. URL: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1392362
- Bray, A., Ismay, C., Chasnovski, E., Couch, S., Baumer, B. & C ¸etinkaya-Rundel, M. (2021), infer: Tidy Statistical Inference. https://github.com/tidymodels/infer, https://infer.tidymodels.org/.
- Burr, W., Chevalier, F., Collins, C., Gibbs, A., Ng, R. & Wild, C. (2021), 'Computational skills by stealth in introductory data science teaching', Teaching Statistics 43, S34-S51. URL: https://onlinelibrary.wiley.com/doi/10.1111/test.12277
- Carver, R., Everson, M., Gabrosek, J., Horton, N. J., Lock, R. H., Mocko, M., Rossman, A., Rowell, G. H., Velleman, P., Witmer, J. A. & Wood, B. (2016), Guidelines for Assess- ment and Instruction in Statistics Education: College Report 2016, American Statistical Association: Alexandria, VA. URL: https://commons.erau.edu/publication/1083
- C ¸etinkaya-Rundel, M., Diez, D., Bray, A., Kim, A. Y., Baumer, B., Ismay, C., Paterno, N. & Barr, C. (2021), openintro: Data Sets and Supplemental Functions from OpenIntro Textbooks and Labs. R package version 2.2.0. URL: https://CRAN.R-project.org/package=openintro
- C ¸etinkaya-Rundel, M. & Ellison, V. (2021), 'A fresh look at introductory data science', Journal of Statistics and Data Science Education pp. 1-11. URL: https://doi.org/10.1080/10691898.2020.1804497
- C ¸etinkaya-Rundel, M. & Hardin, J. (2021), Introduction to Modern Statistics, OpenIntro: Los Angeles. URL: https://www.openintro.org/book/ims/
- Codd, E. F. (1970), 'A relational model of data for large shared data banks', Communica- tions of the ACM 13(6), 377-387. URL: https://doi.org/10.1145/362384.362685
- Cook, I. (2021), tidyquery: Query 'R' Data Frames with 'SQL'. R package version 0.2.3. URL: https://CRAN.R-project.org/package=tidyquery
- Data@Urban (2019), 'Building an R community at the Urban Institute', Medium.com. URL: https://urban-institute.medium.com/building-an-r-community-at-the-urban- institute-b66739aaaaa7
- Dierker, L. (2021), Passion-Driven Statistics: A Supportive, Multidisciplinary, Project- Based, Introductory Course, Wesleyan University: Middletown, CT. URL: https://passiondrivenstatistics.wescreates.wesleyan.edu/e-book
- Eddelbuettel, D. (2018), '#17: Dependencies', dirk.eddelbuettel.com. URL: http://dirk.eddelbuettel.com/blog/2018/02/28/#017 dependencies
- Fergusson, A. & Pfannkuch, M. (2021), 'Introducing teachers who use GUI-driven tools for the randomization test to code-driven tools', Mathematical Thinking and Learning pp. 1-21. URL: https://www.tandfonline.com/doi/full/10.1080/10986065.2021.1922856
- Flowers, A. (2016), FiveThirtyEight's data journalism workflow with R, in 'useR! 2016'. URL: https://user2016.sched.com/event/7BZZ/fivethirtyeights-data-journalism- workflow-with-r
- Garnier, S. (2021), viridis: Colorblind-Friendly Color Maps for R. R package version 0.6.1. URL: https://CRAN.R-project.org/package=viridis
- Gehrke, M., Kistler, T., Lübke, K., Markgraf, N., Krol, B. & Sauer, S. (2021), 'Statistics education from a data-centric perspective', Teaching Statistics 43, S201-S215. URL: https://onlinelibrary.wiley.com/doi/10.1111/test.12264
- Google (2021), Google's R Style Guide, Google: Mountain View, CA. URL: https://google.github.io/styleguide/Rguide.html
- Guzman, L. M., Pennell, M. W., Nikelski, E. & Srivastava, D. S. (2019), 'Successful integra- tion of data science in undergraduate biostatistics courses using cognitive load theory', CBE-Life Sciences Education 18(4), 1-10. URL: https://doi.org/10.1187/cbe.19-02-0041
- Henry, L. & Wickham, H. (2020), purrr: Functional Programming Tools. R package version 0.3.4. URL: https://CRAN.R-project.org/package=purrr
- Henry, L. & Wickham, H. (2021), lifecycle: Manage the Life Cycle of your Package Func- tions. R package version 1.0.0. URL: https://CRAN.R-project.org/package=lifecycle
- Hermans, F. & Aldewereld, M. (2017), Programming is writing is programming, in 'Com- panion to the First International Conference on the Art, Science and Engineering of Pro- gramming', Programming '17, Association for Computing Machinery, New York, NY, USA. URL: https://doi.org/10.1145/3079368.3079413
- Hermans, F., Swidan, A. & Aivaloglou, E. (2018), Code phonology: An exploration into the vocalization of code, in '2018 ACM/IEEE 26th International Conference on Program Comprehension'. URL: https://doi.org/10.1145/3196321.3196355
- Horton, N. J., Baumer, B. S. & Wickham, H. (2015), 'Taking a chance in the classroom: Setting the stage for data science: Integration of data management skills in introductory and second courses in statistics', Chance 28(2), 40-50. URL: https://doi.org/10.1080/09332480.2015.1042739
- Horton, N. J. & Hardin, J. S. (2021), 'Integrating computing in the statistics and data science curriculum: Creative structures, novel skills and habits, and ways to teach com- putational thinking', Journal of Statistics and Data Science Education 29(sup1), S1-S3. URL: https://doi.org/10.1080/10691898.2020.1870416
- Hyndman, R. J. & Athanasopoulos, G. (2021), Forecasting: Principles and Practice, 3rd edn, OTexts: Melbourne, Australia. URL: https://otexts.com/fpp3/
- Ismay, C. & Kim, A. Y. (2019), Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, Chapman and Hall/CRC Press: Boca Raton. URL: https://moderndive.com/
- Kleinman, K. & Horton, N. J. (2009), SAS and R: Data management, statistical analysis, and graphics, Chapman and Hall/CRC: New York. URL: https://doi.org/10.1201/9781420070590
- Kling, R. (1977), 'The organizational context of user-centered software designs', MIS quar- terly 1(4), 41-52. URL: https://doi.org/10.2307/249021
- Kuhn, M. & Wickham, H. (2021), tidymodels: Easily Install and Load the Tidymodels Packages. R package version 0.1.3. URL: https://CRAN.R-project.org/package=tidymodels
- Leek, J. (2016), 'Why I don't use ggplot2', Simply Statistics. URL: https://simplystatistics.org/posts/2016-02-11-why-i-dont-use-ggplot2/
- Lovett, M. C. & Greenhouse, J. B. (2000), 'Applying cognitive theory to statistics instruc- tion', 54(3), 196-206. URL: http://www.tandfonline.com/doi/abs/10.1080/00031305.2000.10474545
- Matloff, N. (2020), 'Tidyverse skeptic: An alternate view of the tidyverse "dialect" of the R language, and its promotion by RStudio', GitHub. URL: https://github.com/matloff/TidyverseSkeptic/blob/master/READMEFull.md
- McNamara, A. (2015), Bridging the Gap Between Tools for Learning and for Doing Statis- tics, PhD thesis, University of California, Los Angeles. URL: https://www.proquest.com/docview/1694580439
- McNamara, A. (2019), 'Key attributes of a modern statistical computing tool', The Amer- ican Statistician 73(4), 375-384. URL: https://doi.org/10.1080/00031305.2018.1482784
- McNamara, A. (2020), Speaking R, useR! The International R Users Conference. keynote presentation. URL: https://www.youtube.com/watch?v=ckW9sSdIVAc&t=676s
- McNamara, A. (2021a), 'R syntax cheatsheet'. URL: https://osf.io/2k8fw/
- McNamara, A. (2021b), 'Reading R code for "An educator's perspective of the tidyverse"'. URL: https://osf.io/r8mez/
- McNamara, A. & Horton, N. J. (2018), 'Wrangling categorical data in R', The American Statistician 72(1), 97-104. URL: https://doi.org/10.1080/00031305.2017.1356375
- McNamara, A., Zieffler, A., Beckman, M., Legacy, C., Butler Basner, E., delMas, R. & Rao, V. V. (2021), Computing in the statistics curriculum: Lessons learned from the educational sciences. United States Conference on Teaching Statistics (USCOTS).
- McNicholas, P. D. & Tait, P. (2019), Data Science with Julia, CRC Press: Boca Raton. URL: https://www.routledge.com/Data-Science-with-Julia/McNicholas- Tait/p/book/9781138499980
- Myint, L., Hadavand, A., Jager, L. & Leek, J. (2020), 'Comparison of beginning R stu- dents' perceptions of peer-made plots created in two plotting systems: A randomized experiment', Journal of Statistics Education 28(1), 98-108. URL: https://doi.org/10.1080/10691898.2019.1695554
- National Academies of Science, Engineering, and Medicine (2018), Data Science for Un- dergraduates: Opportunities and Options, National Academies Press: Washington, DC. Accessed: 2020-06-07. URL: https://nas.edu/envisioningds
- Nolan, D. & Perrett, J. (2016), 'Teaching and learning data visualization: Ideas and as- signments', The American Statistician 70(3), 260-269. URL: https://doi.org/10.1080/00031305.2015.1123651
- Nolan, D. & Temple Lang, D. (2010), 'Computing in the statistics curriculum', The Amer- ican Statistician 64(2), 97-107. URL: https://doi.org/10.1198/tast.2010.09132
- Nolis, H. & Nolis, J. (2020), We're hitting R a million times a day so we made a talk about it, in 'rstudio::conf 2020'. URL: https://www.rstudio.com/resources/rstudioconf-2020/we-re-hitting-r-a-million- times-a-day-so-we-made-a-talk-about-it/
- Norman, D. A. & Draper, S. W. (1986), User Centered System Design; New Perspectives on Human-Computer Interaction, 1st edn, L. Erlbaum Associates Inc.: Hillsdale, NJ, USA. URL: https://dl.acm.org/doi/10.5555/576915
- Postel, J. (1980), 'DoD standard internet protocol', ACM SIGCOMM Computer Commu- nication Review 10(4), 12-51. URL: https://datatracker.ietf.org/doc/html/rfc760
- Pruim, R., Kaplan, D. T. & Horton, N. J. (2017), 'The mosaic Package: Helping Students to 'Think with Data' Using R', The R Journal 9(1), 77-102. URL: https://doi.org/10.32614/RJ-2017-024
- R Core Team (2021), R: A Language and Environment for Statistical Computing, R Foun- dation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/
- Roback, P. & Legler, J. (2021), Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models In R, Chapman and Hall Texts in Statistical Sci- ence, 1st edn, CRC Press: Boca Raton. URL: https://bookdown.org/roback/bookdown-BeyondMLR/
- Robinson, D. (2015), 'Exploring careers data with sqlstackr, dplyr, and ggplot2: Interal stack overflow tutorial', RPubs. URL: https://rpubs.com/dgrtwo/190325
- Robinson, D., Hayes, A. & Couch, S. (2021), broom: Convert Statistical Objects into Tidy Tibbles. R package version 0.7.8. URL: https://CRAN.R-project.org/package=broom
- RStudio PBC (2022), 'RStudio customer stories'. URL: https://www.rstudio.com/about/customer-stories/
- RStudio Team (2020), RStudio: Integrated Development Environment for R, RStudio, PBC, Boston, MA. URL: http://www.rstudio.com/
- Sarkar, D. (2021), lattice: Trellis Graphics for R. R package version 0.20-44. URL: http://lattice.r-forge.r-project.org/
- Smith, D. (2016), 'Welcome to the Tidyverse'. URL: https://blog.revolutionanalytics.com/2016/09/tidyverse.html
- Soloway, E., Guzdial, M. & Hay, K. E. (1994), 'Learner-Centered Design: The Challenge for HCI in the 21st Century', Interactions 1(2), 36-48. URL: https://doi.org/10.1145/174809.174813
- Swidan, A. & Hermans, F. (2019), The effect of reading code aloud on comprehension: An empirical study with school students, in 'Proceedings of the ACM Conference on Global Computing Education', CompEd '19, Association for Computing Machinery, New York, NY, USA, pp. 178-184. URL: https://doi.org/10.1145/3300115.3309504
- Thoma, S., Deitrick, E. & Wilkerson, M. (2018), '"It didn't really go very well": Episte- mological framing and the complexity of interdisciplinary computing activities', Inter- national Society of the Learning Sciences, Inc. . URL: https://repository.isls.org/bitstream/1/574/1/249.pdf
- Tucker, M., Shaw, S., Son, J. & Stigler, J. (2021), Integrating R in a college statistics course improves student attitudes toward programming, in 'Annual Meeting of the American Educational Research Association', Orlando, Florida. accepted.
- Van Rossum, G. & Drake Jr, F. L. (1995), Python tutorial, Vol. 620, Centrum voor Wiskunde en Informatica: Amsterdam. URL: https://fossies.org/linux/misc/python-3.9.5-docs-pdf-a4.tar.bz2/docs- pdf/tutorial.pdf
- Voss, S. E. (2019), 'Resource review', Ear and Hearing 40(6), 1481. URL: https://doi.org/10.1097/aud.0000000000000790
- Wang, X., Rush, C. & Horton, N. J. (2017), 'Data visualization on day one: Bringing big ideas into intro stats early and often', Technology Innovations in Statistics Education 10(1). URL: https://escholarship.org/uc/item/84v3774z
- Watson, B. (2019), R at the ACLU: Joining tables to to reunite families, in 'rstudio::conf 2019'. URL: https://www.rstudio.com/resources/rstudioconf-2019/r-at-the-aclu-joining-tables- to-to-reunite-families/
- Wickham, H. (2014), 'Tidy data', Journal of Statistical Software 59(10), 1-23. URL: http://dx.doi.org/10.18637/jss.v059.i10
- Wickham, H. (2015), 'I'm Hadley Wickham, Chief Scientist at RStudio and creator of lots of R packages (incl. ggplot2, dplyr, and devtools). I love R, data analysis/science, visualisation: ask me anything!', reddit.com. URL: https://www.reddit.com/r/dataisbeautiful/comments/3mp9r7/im hadley wickham chief scientist
- Wickham, H. (2016), Towards a grammar of interactive graphics. useR! The International R Users Conference.
- Wickham, H. (2018), reshape: Flexibly Reshape Data. R package version 0.8.8. URL: http://had.co.nz/reshape
- Wickham, H. (2019), "'please help me figure out good names for the new pivot verbs in tidyr by filling out this (very short!) survey: https://forms.gle/vvygbw1ewhk69ga17 #rstats"', Twitter. URL: https://twitter.com/hadleywickham/status/1109132826631421952
- Wickham, H. (2020a), plyr: Tools for Splitting, Applying and Combining Data. R package version 1.8.6. URL: https://CRAN.R-project.org/package=plyr
- Wickham, H. (2020b), reshape2: Flexibly Reshape Data: A Reboot of the Reshape Package. R package version 1.4.4. URL: https://github.com/hadley/reshape
- Wickham, H. (2021a), forcats: Tools for Working with Categorical Variables (Factors). R package version 0.5.1. URL: https://CRAN.R-project.org/package=forcats
- Wickham, H. (2021b), 'Maintaining the house the tidyverse built'. URL: https://www.rstudio.com/resources/rstudioglobal-2021/maintaining-the-house- the-tidyverse-built/
- Wickham, H. (2021c), tidyr: Tidy Messy Data. R package version 1.1.3. URL: https://CRAN.R-project.org/package=tidyr
- Wickham, H. (2021d), The tidyverse style guide, bookdown. URL: https://style.tidyverse.org
- Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grole- mund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K. & Yutani, H. (2019), 'Welcome to the tidyverse', Jour- nal of Open Source Software 4(43), 1686. URL: https://doi.org/10.21105/joss.01686
- Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H. & Dunnington, D. (2021), ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.3.5. URL: https://CRAN.R-project.org/package=ggplot2
- Wickham, H., Chang, W., Henry, L., Pedersen, T., Takahashi, K., Wilke, C., Woo, K., Yutani, H. & Dewey, D. (2021), Using ggplot2 in packages. URL: https://ggplot2.tidyverse.org/articles/ggplot2-in-packages.html
- Wickham, H., François, R., Henry, L. & Müller, K. (2021a), dplyr: A Grammar of Data Manipulation. R package version 1.0.7. URL: https://CRAN.R-project.org/package=dplyr
- Wickham, H., François, R., Henry, L. & Müller, K. (2021b), Programming with dplyr. URL: https://dplyr.tidyverse.org/articles/programming.html
- Wickham, H., Girlich, M. & Ruiz, E. (2021), dbplyr: A dplyr Back End for Databases. R package version 2.1.1. URL: https://CRAN.R-project.org/package=dbplyr
- Wickham, H. & Grolemund, G. (2016), R for data science: import, tidy, transform, visu- alize, and model data, O'Reilly Media, Inc.: Sebastopol, CA. URL: https://r4ds.had.co.nz/
- Wickham, H., Navarro, D. & Pedersen, T. L. (2021), ggplot2: Elegant graphics for data analysis, 3rd edn, Springer: New York. URL: https://ggplot2-book.org/
- Wilkinson, L. (2012), The grammar of graphics, in 'Handbook of Computational Statistics', Springer, pp. 375-414. URL: https://doi.org/10.1007/978-3-642-21551-3 13
- Zieffler, A., Garfield, J., Alt, S., Dupuis, D., Holleque, K. & Chang, B. (2008), 'What does research suggest about the teaching and learning of introductory statistics at the college level? a review of the literature', Journal of Statistics Education 16(2). URL: http://jse.amstat.org/v16n2/zieffler.html