Papers by Elizabeth Mannshardt

Journal of the American Statistical Association, Oct 9, 2019
People are increasingly concerned with understanding their personal environment, including possib... more People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. In order to make informed decisions on their dayto-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift in measurement technologies. A methodological framework utilizing these increasingly fine-scale measurements to provide real-time air pollution maps and short-term air quality forecasts on a fine-resolution spatial scale could prove to be instrumental in increasing public awareness and understanding. The Google Street View study provides a unique source of data with spatial and temporal complexities, with the potential to provide information about commuter exposure and hot spots within city streets with high traffic. We develop a computationally efficient spatiotemporal model for these data and use the model to make short-term forecasts and high-resolution maps of current air pollution levels. We also show via an experiment that mobile networks can provide more nuanced information than an equally-sized fixed-location network. This modeling framework has important real-world implications in understanding citizens' personal environments, as data production and real-time availability continue to be driven by the ongoing development and improvement of mobile measurement technologies.

Significance, Oct 1, 2018
he wildfires that raged this summer in Mendocino County, California are the largest in the state'... more he wildfires that raged this summer in Mendocino County, California are the largest in the state's recent history, according to the California Department of Forestry and Fire Protection (bit.ly/2wdWecs). At the time of writing, flames had scorched 459 000 acres of land, destroying 280 structures and causing a single death. They also gave rise to vast amounts of smoke, lowering the quality of air that Californians had to breathe. Concerned residents in the vicinity of the fires could track the level of air pollutants on an hourly basis using data collected by the US Environmental Protection Agency (EPA). The agency's Air Quality Index (AQI) indicates how clean or polluted ambient outdoor air is, translating data into numbers and colours that help people understand when to take action to protect their health. Figure shows an example of AQI data, sourced from the EPA's AirNow.gov website. Graphics of this sort offer an immediate visual guide to local conditions. But the simplicity of the presentation belies the complex statistical work going on behind the scenes, with methods including spatial interpolation, data fusion, prediction and forecasting all working to provide the public with up-to-date air quality information. Statistics, modelling, and analytical assessments play a prominent role in the work of the EPA and its mission to protect human health and the environment, as this article will explain.

American Journal of Climate Change, 2013
One of the more critical issues in a changing climate is the behavior of extreme weather events, ... more One of the more critical issues in a changing climate is the behavior of extreme weather events, such as severe tornadic storms as seen recently in Moore and El Reno, Oklahoma. It is generally thought that such events would increase under a changing climate. How to evaluate this extreme behavior is a topic currently under much debate and investigation. One approach is to look at the behavior of large scale indicators of severe weather. The use of the generalized extreme value distribution for annual maxima is explored for a combination product of convective available potential energy and wind shear. Results from this initial study show successful modeling and high quantile prediction using extreme value methods. Predicted large scale values are consistent across different extreme value modeling frameworks, and a general increase over time in predicted values is indicated. A case study utilizing this methodology considers the large scale atmospheric indicators for the
A Data Quality Scorecard to Assess a Data Source’s Fitness for Use

Baseball scouting reports via a marked point process for pitch types
Statistics and baseball have always gone hand in hand. In baseball’s earliest days newspapers pri... more Statistics and baseball have always gone hand in hand. In baseball’s earliest days newspapers printed a summary of baseball games that showed the number of runs scored by each player and the number of runs each team scored in each inning (Schwarz, 2004). Over the years what we now know as a box score has evolved to include all of the various outcomes of every pitcher and batter that takes part in the game. Today box scores are just one set of statistical information that is kept in an attempt to summarize a game of baseball. In addition to box scores, fans can now find complete play by play data sets as well as data sets containing pitch trajectory data for every major league baseball game. This abundance of data is used by teams and fans alike in an attempt to explain what they see on ball field. In the past when individual pitch data was collected it was done by tracking each pitch by eye and marking it by hand in a scout’s notebook. Alternatively all of the pitches could be video taped and plotted by hand later. This type of data collection led to large amounts of measurement error for the individual pitch data. However a new age for baseball data collection is beginning thanks in part to the PITCHf/x system (Fast, 2010). This recently developed system uses cameras within every Major League ball park to track the speed, movement, and location of every pitch thrown and provides detailed data about pitch location. Until recently the use of spatial statistical techniques for baseball scouting reports has been fairly restricted. However with the PITCHf/x system providing vast amounts of detailed data regarding pitch location, statistical techniques for
A Survey of Spatial Extremes
DOAJ (DOAJ: Directory of Open Access Journals), Apr 1, 2012

Journal of Climate, Oct 13, 2015
Several climate modeling groups have recently generated ensembles of last-millennium climate simu... more Several climate modeling groups have recently generated ensembles of last-millennium climate simulations under different forcing scenarios. These experiments represent an ideal opportunity to establish the baseline feasibility of using proxy-based reconstructions of late-Holocene climate as out-of-calibration tests of the fidelity of the general circulation models used to project future climate. This paper develops a formal statistical model for assessing the agreement between members of an ensemble of climate simulations and the ensemble of possible climate histories produced from a hierarchical Bayesian climate reconstruction. As the internal variabilities of the simulated and reconstructed climate are decoupled from one another, the comparison is between the two latent, or unobserved, forced responses. Comparisons of the spatial average of a 600-yr high northern latitude temperature reconstruction to suites of last-millennium climate simulations from the GISS-E2 and CSIRO models, respectively, suggest that the proxy-based reconstructions are able to discriminate only between the crudest features of the simulations within each ensemble. Although one of the three volcanic forcing scenarios used in the GISS-E2 ensemble results in superior agreement with the reconstruction, no meaningful distinctions can be made between simulations performed with different estimates of solar forcing or land cover changes. In the case of the CSIRO model, sequentially adding orbital, greenhouse gas, solar, and volcanic forcings to the simulations generally improves overall consensus with the reconstruction, though the distinctions are not individually significant.

Quaternary Science Reviews, Mar 1, 2012
Reconstructing a climate process in both space and time from incomplete instrumental and climate ... more Reconstructing a climate process in both space and time from incomplete instrumental and climate proxy time series is a problem with clear societal relevance that poses both scientific and statistical challenges. These challenges, along with the interdisciplinary nature of the reconstruction problem, point to the need for greater cooperation between the earth science and statistics communities e a sentiment echoed in recent parliamentary reports. As a step in this direction, it is prudent to formalize what is meant by the paleoclimate reconstruction problem using the language and tools of modern statistics. This article considers the challenge of inferring, with uncertainties, a climate process through space and time from overlapping instrumental and climate sensitive proxy time series that are assumed to be well dated e an assumption that is likely only reasonable for certain proxies over at most the last few millennia. Within a unifying, hierarchical spaceetime modeling framework for this problem, the modeling assumptions made by a number of published methods can be understood as special cases, and the distinction between modeling assumptions and analysis or inference choices becomes more transparent. The key aims of this article are to 1) establish a unifying modeling and notational framework for the paleoclimate reconstruction problem that is transparent to both the climate science and statistics communities; 2) describe how currently favored methods fit within this framework; 3) outline and distinguish between scientific and statistical challenges; 4) indicate how recent advances in the statistical modeling of large spaceetime data sets, as well as advances in statistical computation, can be brought to bear upon the problem; 5) offer, in broad strokes, some suggestions for model construction and how to perform the required statistical inference; and 6) identify issues that are important to both the climate science and applied statistics communities, and encourage greater collaboration between the two.
Paleoclimate reconstruction: looking backwards to look forward
Chapman and Hall/CRC eBooks, Jan 15, 2019

Climatic Change, Sep 5, 2012
Many analyses of the paleoclimate record include conclusions about extremes, with a focus on the ... more Many analyses of the paleoclimate record include conclusions about extremes, with a focus on the unprecedented nature of recent climate events. While the use of extreme value theory is becoming common in the analysis of the instrumental climate record, applications of this framework to the spatio-temporal analysis of paleoclimate records remain limited. This article develops a Bayesian hierarchical model to investigate spatially varying trends and dependencies in the parameters characterizing the distribution of extremes of a proxy data set, and applies it to the site-wise decadal maxima and minima of a gridded network of temperature sensitive tree ring density time series over northern North America. The statistical analysis reveals significant spatial associations in the temporal trends of the location parameters of the generalized extreme value distributions: maxima are increasing as a function of time, with stronger increases in the north and east of North America; minima are significantly increasing in the west, possibly decreasing in the east, and Electronic supplementary material The online version of this article

Carolina Digital Repository (University of North Carolina at Chapel Hill), 2012
• We survey the current practice of analyzing spatial extreme data, which lies at the intersectio... more • We survey the current practice of analyzing spatial extreme data, which lies at the intersection of extreme value theory and geostatistics. Characterizations of multivariate max-stable distributions typically assume specific univariate marginal distributions, and their statistical applications generally require capturing the tail behavior of the margins and describing the tail dependence among the components. We review current methodology for spatial extremes analysis, discuss the extension of the finite-dimensional extremes framework to spatial processes, review spatial dependence metrics for extremes, survey current modeling practice for the task of modeling marginal distributions, and then examine max-stable process models and copula approaches for modeling residual spatial dependence after accounting for marginal effects.
Bulletin of the American Meteorological Society, Mar 1, 2013
What: Approximately 60 statisticians, mathematicians, and climate scientists from academia and go... more What: Approximately 60 statisticians, mathematicians, and climate scientists from academia and governmental institutions met to discuss the issues surrounding uncertainty quantification in the context of climate observations.
Google Street View Air Quality Data: Oakland CA NO2 2015-2016
Other extreme value analyses In the article a block maxima modeling approach is employed. There a... more Other extreme value analyses In the article a block maxima modeling approach is employed. There are two other approaches that were considered: 1. The running-maxima approach. Instead of calculating one maxima for each block, the running maxima of block length B are calculated over the entire time period. While this approach generates more maximum values to study, it also introduces dependence between the maxima, and therefore necessitates a more involved model. 2. The points-over-threshold approach, which is also known as the peaks over thresholds approach, is based on examining all the observations over a given threshold. Under certain conditions, the distribution of excess values follow the generalized pareto distribution (GPD)

Splicing of Multi-Scale Downscaler Air Quality Surfaces
The United States Environmental Protection Agency (EPA) makes use of a suite of statistical data ... more The United States Environmental Protection Agency (EPA) makes use of a suite of statistical data fusion techniques that combine ambient monitoring data with air quality model results to characterize pollutant concentrations for use in various policy assessments. Data fusion can overcome some of the spatial limitations of monitoring networks and benefit from the spatial and temporal coverage of air quality modeling. The current EPA air pollution prediction model uses a downscaler (DS) model to estimate pollutant concentrations on a national surface. Of interest are ways to improve the performance of the DS in certain areas of the continental US, particularly those with sparse monitor representation. The current methodology utilizes the same spatial range parameter across the continental United States. In order to capitalize on the strengths of spatial modeling capabilities, we consider predictions run on a regional scale. We do this by allowing for a flexible spatial range parameter....
Paleoclimate reconstruction: looking backwards to look forward
Spatial and Temporal Statistics for Validating Aquarius Sea Surface Salinity Measurements with Argo in Situ Observations

Atmospheric Environment, 2021
Daily maximum 8-hour average (MDA8) ozone (O 3) concentrations are well-known to be influenced by... more Daily maximum 8-hour average (MDA8) ozone (O 3) concentrations are well-known to be influenced by local meteorological conditions, which vary across both daily and seasonal temporal scales. Previous studies have adjusted long-term trends in O 3 concentrations for meteorological effects using various statistical and mathematical methods in order to get a better estimate of the long-term changes in O 3 concentrations due to changes in precursor emissions such as nitrogen oxides (NO X) and volatile organic compounds (VOCs). In this work, the authors present improvements to the current method used by the United States Environmental Protection Agency (US EPA) to adjust O 3 trends for meteorological influences by making refinements to the input data sources and by allowing the underlying statistical model to vary locally using a variable selection procedure. The current method is also expanded by using a quantile regression model to adjust trends in the 90 th and 98 th percentiles of the distribution of MDA8 O 3 concentrations, allowing for a better understanding of the effects of local meteorology on peak O 3 levels in addition to seasonal average concentrations. The revised method is used to adjust trends in the May to September mean, 90 th percentile, and 98 th percentile MDA8 O 3 concentrations at over 700 monitoring sites in the U.S. for years 2000 to 2016. The utilization of variable selection and quantile regression allow for a more in-depth understanding of how weather conditions affect O 3 levels in the U.S. This represents a fundamental advancement in our ability to understand how interannual variability in weather conditions in the U.S. may impact attainment of the O 3 National Ambient Air Quality Standards (NAAQS).

Atmosphere, 2020
The United States Environmental Protection Agency (EPA) has implemented a Bayesian spatial data f... more The United States Environmental Protection Agency (EPA) has implemented a Bayesian spatial data fusion model called the Downscaler (DS) model to generate daily air quality surfaces for PM2.5 across the contiguous U.S. Previous implementations of DS relied on monitoring data from EPA’s Air Quality System (AQS) network, which is largely concentrated in urban areas. In this work, we introduce to the DS modeling framework an additional PM2.5 input dataset from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network located mainly in remote sites. In the western U.S. where IMPROVE sites are relatively dense (compared to the eastern U.S.), the inclusion of IMPROVE PM2.5 data to the DS model runs reduces predicted annual averages and 98th percentile concentrations by as much as 1.0 and 4 μg m−3, respectively. Some urban areas in the western U.S., such as Denver, Colorado, had moderate increases in the predicted annual average concentrations, which led to a sharpening ...

Journal of the American Statistical Association, 2019
People are increasingly concerned with understanding their personal environment, including possib... more People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. In order to make informed decisions on their dayto-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift in measurement technologies. A methodological framework utilizing these increasingly fine-scale measurements to provide real-time air pollution maps and short-term air quality forecasts on a fine-resolution spatial scale could prove to be instrumental in increasing public awareness and understanding. The Google Street View study provides a unique source of data with spatial and temporal complexities, with the potential to provide information about commuter exposure and hot spots within city streets with high traffic. We develop a computationally efficient spatiotemporal model for these data and use the model to make short-term forecasts and high-resolution maps of current air pollution levels. We also show via an experiment that mobile networks can provide more nuanced information than an equally-sized fixed-location network. This modeling framework has important real-world implications in understanding citizens' personal environments, as data production and real-time availability continue to be driven by the ongoing development and improvement of mobile measurement technologies.
Uploads
Papers by Elizabeth Mannshardt