Papers by Benjamin Baumer
Predictive modeling
Chapman and Hall/CRC eBooks, Apr 13, 2021
arXiv (Cornell University), Jan 21, 2020
We present a programmatic approach to incorporating ethics into an undergraduate major in statist... more We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for integrating ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated into existing courses at different levels of instruction with minimal disruption to syllabi. We connect our efforts to a growing body of literature
Graphics
Chapman and Hall/CRC eBooks, Nov 19, 2018
Complement to 'Modern Data Science with R' [R package mdsr version 0.2.3]

The Annals of Applied Statistics, Dec 1, 2018
Statistical applications in sports have long centered on how to best separate signal (e.g., team ... more Statistical applications in sports have long centered on how to best separate signal (e.g., team talent) from random noise. However, most of this work has concentrated on a single sport, and the development of meaningful cross-sport comparisons has been impeded by the difficulty of translating luck from one sport to another. In this manuscript we develop Bayesian state-space models using betting market data that can be uniformly applied across sporting organizations to better understand the role of randomness in game outcomes. These models can be used to extract estimates of team strength, the between-season, within-season and game-to-game variability of team strengths, as well each team's home advantage. We implement our approach across a decade of play in each of the National Football League (NFL), National Hockey League (NHL), National Basketball Association (NBA) and Major League Baseball (MLB), finding that the NBA demonstrates both the largest dispersion in talent and the largest home advantage, while the NHL and MLB stand out for their relative randomness in game outcomes. We conclude by proposing new metrics for judging competitiveness across sports leagues, both within the regular season and using traditional postseason tournament formats. Although we focus on sports, we discuss a number of other situations in which our generalizable models might be usefully applied.
A grammar for graphics
Chapman and Hall/CRC eBooks, Apr 13, 2021
Database querying using SQL
Chapman and Hall/CRC eBooks, Apr 13, 2021
Data Package for the 2016 United States Federal Elections [R package fec16 version 0.1.3]
Retrieve Data from MacLeish Field Station [R package macleish version 0.3.6]
2. The Growth and Application of Baseball Analytics Today
Evaluation of Batters and Base Runners
Extract-Transform-Load Framework for Medium Data [R package etl version 0.4.0]
Complement to 'Modern Data Science with R' [R package mdsr version 0.2.4]
Using a Database to Compute Park Factors

Developmental Biology, Apr 1, 2020
Research in the life sciences has traditionally relied on the analysis of clear morphological phe... more Research in the life sciences has traditionally relied on the analysis of clear morphological phenotypes, which are often revealed using increasingly powerful microscopy techniques analyzed as maximum intensity projections (MIPs). However, as biology turns towards the analysis of more subtle phenotypes, MIPs and qualitative approaches are failing to adequately describe these phenotypes. To address these limitations and quantitatively analyze the three-dimensional (3D) spatial relationships of biological structures, we developed the computational method and program called ΔSCOPE (Changes in Spatial Cylindrical Coordinate Orientation using PCA Examination). Our approach uses the fluorescent signal distribution within a 3D data set and reorients the fluorescent signal to a relative biological reference structure. This approach enables quantification and statistical analysis of spatial relationships and signal density in 3D multichannel signals that are positioned around a well-defined structure contained in a reference channel. We validated the application of ΔSCOPE by analyzing normal axon and glial cell guidance in the zebrafish forebrain and by quantifying the commissural phenotypes associated with abnormal Slit guidance cue expression in the forebrain. Despite commissural phenotypes which display disruptions to the reference structure, ΔSCOPE was able to detect subtle, previously uncharacterized changes in zebrafish forebrain midline crossing axons and glia. This method has been developed as a user-friendly, open source program. We propose that ΔSCOPE is an innovative approach to advancing the state of image quantification in the field of high resolution microscopy, and that the techniques presented here are of broad applications to the life science field.
eScholarship (California Digital Library), Aug 7, 2021
Computing makes up a large and growing component of data science and statistics courses. Many of ... more Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around use of the tidyverse. The tidyverse, in the words of its developers, "is a collection of R packages that share a high-level
Iteration
Chapman and Hall/CRC eBooks, Apr 13, 2021
Modern Data Science with R: Second Edition
We feature a series of complex, real-world extended case studies and examples from a broad range ... more We feature a series of complex, real-world extended case studies and examples from a broad range of application areas, including politics, transportation, sports, environmental science, public health, social media, and entertainment. These rich data sets require the use of sophisticated data extraction techniques, modern data visualization approaches, and refined computational approaches. Context is king for such questions, and we have structured the book to foster the parallel developments of statistical thinking, data-related skills, and communication. Each chapter focuses on a different extended example with diverse applications, while exercises allow for the development and refinement of the skills learned in that chapter. (From publisher
Uploads
Papers by Benjamin Baumer