Papers by Elias T Krainski

arXiv (Cornell University), Feb 14, 2022
This paper describes the methodology used by the team RedSea in the data competition organized fo... more This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel two-part model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differential equation (INLA-SPDE) approach. In the first part, a binary non-stationary spatio-temporal model is used to describe the underlying process that determines whether or not there is wildfire at a specific time and location. In the second part, we consider a non-stationary model that is based on log-Gaussian Cox processes for positive wildfire count data, and a non-stationary log-Gaussian model for positive burnt area data. Dependence between the positive count data and positive burnt area data is captured by a shared spatiotemporal random effect. Our two-part modeling approach performs well in terms of 1 the prediction score criterion chosen by the data competition organizers. Moreover, our model results show that surface pressure is the most influential driver for the occurrence of a wildfire, whilst surface net solar radiation and surface pressure are the key drivers for large numbers of wildfires, and temperature and evaporation are the key drivers of large burnt areas.

Jornal Brasileiro De Pneumologia, May 30, 2023
Objetivo: As crianças são um grupo demográfico importante para a compreensão da epidemiologia da ... more Objetivo: As crianças são um grupo demográfico importante para a compreensão da epidemiologia da tuberculose em geral, e o monitoramento da tuberculose infantil é essencial para a prevenção adequada. O presente estudo procurou caracterizar a distribuição espacial das taxas de notificação de tuberculose infantil em Portugal continental; identificar áreas de alto risco e avaliar a associação entre taxas de notificação de tuberculose infantil e privação socioeconômica. Métodos: Por meio de modelos espaciais hierárquicos bayesianos, analisamos a distribuição geográfica das taxas de notificação de tuberculose pediátrica em 278 municípios entre 2016 e 2020 e determinamos as áreas de alto e baixo risco. Usamos a versão portuguesa do European Deprivation Index para calcular a associação entre a tuberculose infantil e a privação socioeconômica em cada área. Resultados: As taxas de notificação variaram de 1,8 a 13,15 por 100.000 crianças com idade < 5 anos. Identificamos sete áreas de alto risco, cujo risco relativo era significativamente maior que a média da área de estudo. Todas as sete áreas de alto risco situavam-se na área metropolitana do Porto e de Lisboa. Houve uma relação significativa entre a privação socioeconômica e as taxas de notificação de tuberculose pediátrica (risco relativo = 1,16; intervalo de credibilidade de 95%: 1,05-1,29). Conclusões: Áreas identificadas como sendo de alto risco e desfavorecidas socioeconomicamente devem constituir áreas-alvo para o controle da tuberculose, e esses dados devem ser integrados a outros fatores de risco para definir critérios mais precisos para a vacinação com BCG.

arXiv (Cornell University), Mar 11, 2022
Modeling longitudinal and survival data jointly offers many advantages such as addressing measure... more Modeling longitudinal and survival data jointly offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/survival outcome) usually linked together through correlated or shared random effects. Their estimation is computationally expensive (particularly due to a multidimensional integration of the likelihood over the random effects distribution) so that inference methods become rapidly intractable, and restricts applications of joint models to a small number of longitudinal markers and/or random effects. We introduce a Bayesian approximation based on the Integrated Nested Laplace Approximation algorithm implemented in the R package R-INLA to alleviate the computational burden and allow the estimation of multivariate joint models with fewer restrictions. Our simulation studies show that R-INLA substantially reduces the computation time and the variability of the parameter estimates compared to alternative estimation strategies. We further apply the methodology to analyze 5 longitudinal markers (3 continuous, 1 count, 1 binary, and 16 random effects) and competing risks of death and transplantation in a clinical trial on primary biliary cholangitis. R-INLA provides a fast and reliable inference technique for applying joint models to the complex multivariate data encountered in health research.

Computational Statistics & Data Analysis, May 1, 2023
Integrated Nested Laplace Approximations (INLA) has been a successful approximate Bayesian infere... more Integrated Nested Laplace Approximations (INLA) has been a successful approximate Bayesian inference framework since its proposal by [38]. The increased computational efficiency and accuracy when compared with samplingbased methods for Bayesian inference like MCMC methods, are some contributors to its success. Ongoing research in the INLA methodology and implementation thereof in the R package R-INLA, ensures continued relevance for practitioners and improved performance and applicability of INLA. The era of big data and some recent research developments, presents an opportunity to reformulate some aspects of the classic INLA formulation, to achieve even faster inference, improved numerical stability and scalability. The improvement is especially noticeable for data-rich models. We demonstrate the efficiency gains with various examples of data-rich models, like Cox's proportional hazards model, an item-response theory model, a spatial model including prediction, and a 3-dimensional model for fMRI data.

Environmetrics, Jul 29, 2020
In air pollution studies, dispersion models provide estimates of concentration at grid level cove... more In air pollution studies, dispersion models provide estimates of concentration at grid level covering the entire spatial domain and are then calibrated against measurements from monitoring stations. However, these different data sources are misaligned in space and time. If misalignment is not considered, it can bias the predictions. We aim at demonstrating how the combination of multiple data sources, such as dispersion model outputs, ground observations, and covariates, leads to more accurate predictions of air pollution at grid level. We consider nitrogen dioxide (NO 2) concentration in Greater London and surroundings for the years 2007-2011 and combine two different dispersion models. Different sets of spatial and temporal effects are included in order to obtain the best predictive capability. Our proposed model is framed in between calibration and Bayesian melding techniques for data fusion. Unlike other examples, we jointly model the response (concentration level at monitoring stations) and the dispersion model outputs on different scales, accounting for the different sources of uncertainty. Our spatiotemporal model allows us to reconstruct the latent fields of each model component, and to predict daily pollution concentrations. We compare the predictive capability of our proposed model with other established methods to account for misalignment (e.g., bilinear interpolation), showing that in our case study the joint model is a better alternative.

Geospatial Health, Nov 7, 2017
Spatial inequalities in old-age survival exist in Portugal and might be associated with factors p... more Spatial inequalities in old-age survival exist in Portugal and might be associated with factors pertaining to three distinct domains: socioeconomic, physical environmental and healthcare. We evaluated the contribution of these factors on the old-age survival across Portuguese municipalities deriving a surrogate measure of life expectancy, a 10-year survival rate that expresses the proportion of the population aged 75-84 years old who reached 85-94. As covariates we used two internationally comparable multivariate indexes: the European deprivation index and the multiple physical environmental deprivation index. A national index was developed to evaluate the access to healthcare. Smoothed rates and odds ratios (OR) were estimated using Bayesian spatial models. Socioeconomic deprivation was found to be the most relevant factor influencing old-age survival in Portugal [women: least deprived areas OR=1.132(1.064-1.207); men OR=1.044(1.001-1.094)] and explained a sizable amount of the spatial variance in survival, especially among women. Access to healthcare was associated with old-age survival in the univariable model only; results lost significance after adjustment for socioeconomic circumstances [women: higher access to healthcare OR=1.020(0.973-1.072); men OR=1.021(0.989-1.060)]. Physical environmental deprivation was unrelated with old-age survival. In conclusion, socioeconomic deprivation was the most important determinant in explaining spatial disparities in old-age survival in Portugal, which indicates that policy makers should direct their efforts to tackle socioeconomic differentials between regions.

Wiley Interdisciplinary Reviews: Computational Statistics, Jul 5, 2018
Coming up with Bayesian models for spatial data is easy, but performing inference with them can b... more Coming up with Bayesian models for spatial data is easy, but performing inference with them can be challenging. Writing fast inference code for a complex spatial model with realisticallysized datasets from scratch is time-consuming, and if changes are made to the model, there is little guarantee that the code performs well. The key advantages of R-INLA are the ease with which complex models can be created and modified, without the need to write complex code, and the speed at which inference can be done even for spatial problems with hundreds of thousands of observations. R-INLA handles latent Gaussian models, where fixed effects, structured and unstructured Gaussian random effects are combined linearly in a linear predictor, and the elements of the linear predictor are observed through one or more likelihoods. The structured random effects can be both standard areal model such as the Besag and the BYM models, and geostatistical models from a subset of the Matérn Gaussian random fields. In this review, we discuss the large success of spatial modelling with R-INLA and the types of spatial models that can be fitted, we give an overview of recent developments for areal models, and we give an overview of the stochastic partial differential equation (SPDE) approach and some of the ways it can be extended beyond the assumptions of isotropy and separability. In particular, we describe how slight changes to the SPDE approach leads to straightforward approaches for non-stationary spatial models and non-separable space-time models.

Plant Pathology, Sep 4, 2018
Leprosis is caused by the Citrus leprosis virus cytoplasmic type and is vectored by the mite Brev... more Leprosis is caused by the Citrus leprosis virus cytoplasmic type and is vectored by the mite Brevipalpus yothersi. Miticide applications, which cost $54 million annually, are based on inspection for the presence of mites. The aim of the present study was to characterize the spatial patterns of B. yothersi-infested trees and trees with leprosis symptoms for further improvement in sampling and disease control. The presence of mites and the occurrence of leprosis were assessed over two years in 1160 Valencia trees and 720 Natal trees in a commercial sweet orange grove in Sao Paulo State, Brazil. To assess the natural growth and dispersal of mites and leprosis, mite populations were not controlled during the experimental period. Maps of mite-infested trees and trees with leprosis symptoms were analysed at three different levels of spatial hierarchy using complementary methods, i.e. among adjacent trees within and across rows, within quadrats, and the strength and orientation of aggregation among quadrats. The study showed that the spatial patterns of virus-infected and mite-infested trees were different, with a strong aggregation pattern of trees with leprosis symptoms that increased over time. Conversely, the spatial pattern of B. yothersi showed randomness or weak aggregation at all three spatial hierarchical levels. Disease incidence increased steadily in plots of both cultivars, unlike in miteinfested trees where incidence fluctuated over time. These results have important implications for the development of better management strategies for leprosis. Sampling methods and action thresholds for mite control should consider primary disease inoculum in addition to the incidence of mites.

arXiv (Cornell University), Jun 29, 2023
This paper aims to extend the Besag model, a widely used Bayesian spatial model in disease mappin... more This paper aims to extend the Besag model, a widely used Bayesian spatial model in disease mapping, to a non-stationary spatial model for irregular lattice-type data. The goal is to improve the model's ability to capture complex spatial dependence patterns and increase interpretability. The proposed model uses multiple precision parameters, accounting for different intensities of spatial dependence in different sub-regions. We derive a joint penalized complexity prior for the flexible local precision parameters to prevent overfitting and ensure contraction to the stationary model at a user-defined rate. The proposed methodology can be used as a basis for the development of various other non-stationary effects over other domains such as time. An accompanying R package fbesag equips the reader with the necessary tools for immediate use and application. We illustrate the novelty of the proposal by modeling the risk of dengue in Brazil, where the stationary spatial assumption fails and interesting risk profiles are estimated when accounting for spatial non-stationary.

arXiv (Cornell University), Jun 8, 2020
Gaussian random fields with Matérn covariance functions are popular models in spatial statistics ... more Gaussian random fields with Matérn covariance functions are popular models in spatial statistics and machine learning. In this work, we develop a spatio-temporal extension of the Gaussian Matérn fields formulated as solutions to a stochastic partial differential equation. The spatially stationary subset of the models have marginal spatial Matérn covariances, and the model also extends to Whittle-Matérn fields on curved manifolds, and to more general non-stationary fields. In addition to the parameters of the spatial dependence (variance, smoothness, and practical correlation range) it additionally has parameters controlling the practical correlation range in time, the smoothness in time, and the type of non-separability of the spatio-temporal covariance. Through the separability parameter, the model also allows for separable covariance functions. We provide a sparse representation based on a finite element approximation, that is well suited for statistical inference and which is implemented in the R-INLA software. The flexibility of the model is illustrated in an application to spatio-temporal modeling of global temperature data.
arXiv (Cornell University), Dec 4, 2022
This tutorial shows how various Bayesian survival models can be fitted using the integrated neste... more This tutorial shows how various Bayesian survival models can be fitted using the integrated nested Laplace approximation in a clear, legible, and comprehensible manner using the INLA and INLAjoint R-packages. Such models include accelerated failure time, proportional hazards, mixture cure, competing risks, multi-state, frailty, and joint models of longitudinal and survival data, originally presented in the article "Bayesian survival analysis with BUGS". 1 In addition, we illustrate the implementation of a new joint model for a longitudinal semicontinuous marker, recurrent events, and a terminal event. Our proposal aims to provide the reader with syntax examples for implementing survival models using a fast and accurate approximate Bayesian inferential approach.

International Journal of Public Health, Feb 26, 2018
Objectives Analyze the association between socioeconomic deprivation and old-age survival in Euro... more Objectives Analyze the association between socioeconomic deprivation and old-age survival in Europe, and investigate whether it varies by country and gender. Methods Our study incorporated five countries (Portugal, Spain, France, Italy, and England). A 10-year survival rate expressing the proportion of population aged 75-84 years who reached 85-94 years old was calculated at area-level for 2001-11. To estimate associations, we used Bayesian spatial models and a transnational measure of deprivation. Attributable/prevention fractions were calculated. Results Overall, there was a significant association between deprivation and survival in both genders. In England that association was stronger, following a dose-response relation. Although lesser in magnitude, significant associations were observed in Spain and Italy, whereas in France and Portugal these were even weaker. The elimination of socioeconomic differences between areas would increase survival by 7.1%, and even a small reduction in socioeconomic differences would lead to a 1.6% increase. Conclusions Socioeconomic deprivation was associated with survival among older adults at ecological-level, although with varying magnitude across countries. Reasons for such crosscountry differences should be sought. Our results emphasize the importance of reducing socioeconomic differences between areas.
International Journal of Tuberculosis and Lung Disease, Jul 1, 2017
To analyse the geographical distribution of tuberculosis (TB) in Portugal and estimate the associ... more To analyse the geographical distribution of tuberculosis (TB) in Portugal and estimate the association between TB and socioeconomic deprivation. M E T H O D S : An ecological study at the municipality level using TB notifications for 2010-2014 was conducted. Spatial Bayesian models were used to calculate smoothed standardised notification rates, identify highand low-risk areas and estimate the association between TB notification and the European Deprivation Index (EDI) for Portugal and its component variables. R E S U LT S : Standardised notification rates ranged from 4.41 to 76.44 notifications per 100 000 population. Forty-one high-risk and 156 low-risk municipalities were identified. There was no statistically significant

arXiv (Cornell University), Feb 18, 2018
Coming up with Bayesian models for spatial data is easy, but performing inference with them can b... more Coming up with Bayesian models for spatial data is easy, but performing inference with them can be challenging. Writing fast inference code for a complex spatial model with realisticallysized datasets from scratch is time-consuming, and if changes are made to the model, there is little guarantee that the code performs well. The key advantages of R-INLA are the ease with which complex models can be created and modified, without the need to write complex code, and the speed at which inference can be done even for spatial problems with hundreds of thousands of observations. R-INLA handles latent Gaussian models, where fixed effects, structured and unstructured Gaussian random effects are combined linearly in a linear predictor, and the elements of the linear predictor are observed through one or more likelihoods. The structured random effects can be both standard areal model such as the Besag and the BYM models, and geostatistical models from a subset of the Matérn Gaussian random fields. In this review, we discuss the large success of spatial modelling with R-INLA and the types of spatial models that can be fitted, we give an overview of recent developments for areal models, and we give an overview of the stochastic partial differential equation (SPDE) approach and some of the ways it can be extended beyond the assumptions of isotropy and separability. In particular, we describe how slight changes to the SPDE approach leads to straightforward approaches for non-stationary spatial models and non-separable space-time models.
arXiv (Cornell University), Jun 8, 2020
The Matérn field is the most well known family of covariance functions used for Gaussian processe... more The Matérn field is the most well known family of covariance functions used for Gaussian processes in spatial models. We build upon the original research of Whittle (1953, 1964) and develop the diffusion-based extension of the Matérn field to space-time (DEMF). We argue that this diffusion-based extension is the natural extension of these processes, due to the strong physical interpretation. The corresponding non-separable spatio-temporal Gaussian process is a spatiotemporal analogue of the Matérn field, with range parameters in space and time, smoothness parameters in space and time, and a separability parameter. We provide a sparse representation based on finite element methods that is well suited for statistical inference.
Statistical Analysis of Space-time Data: New Models and Applications

arXiv (Cornell University), Mar 27, 2023
Bayesian inference tasks continue to pose a computational challenge. This especially holds for sp... more Bayesian inference tasks continue to pose a computational challenge. This especially holds for spatial-temporal modeling where high-dimensional latent parameter spaces are ubiquitous. The methodology of integrated nested Laplace approximations (INLA) provides a framework for performing Bayesian inference applicable to a large subclass of additive Bayesian hierarchical models. In combination with the stochastic partial differential equations (SPDE) approach it gives rise to an efficient method for spatial-temporal modeling. In this work we build on the INLA-SPDE approach, by putting forward a performant distributed memory variant, INLA DIST , for large-scale applications. To perform the arising computational kernel operations, consisting of Cholesky factorizations, solving linear systems, and selected matrix inversions, we present two numerical solver options, a sparse CPU-based library and a novel blocked GPU-accelerated approach which we propose. We leverage the recurring nonzero block structure in the arising precision (inverse covariance) matrices, which allows us to employ dense subroutines within a sparse setting. Both versions of INLA DIST are highly scalable, capable of performing inference on models with millions of latent parameters. We demonstrate their accuracy and performance on synthetic as well as real-world climate dataset applications.

Trends of Anthropogenic Dissolved Inorganic Carbon in the Northwest Atlantic Ocean Estimated Using a State Space Model
Journal of Geophysical Research: Oceans
The northwest Atlantic Ocean is an important sink for carbon dioxide produced by anthropogenic ac... more The northwest Atlantic Ocean is an important sink for carbon dioxide produced by anthropogenic activities. However the strong seasonal variability in the surface waters paired with the sparse and summer biased observations of ocean carbon makes it difficult to capture a full picture of its temporal variations throughout the water column. We aim to improve the estimation of temporal trends of dissolved inorganic carbon (DIC) due to anthropogenic sources using a new statistical approach: a time series generalization of the extended multiple linear regression (eMLR) method. Anthropogenic increase of northwest Atlantic DIC in the surface waters is hard to quantify due to the strong, natural seasonal variations of DIC. We address this by separating DIC into its seasonal, natural and anthropogenic components. Ocean carbon data is often collected in the summer, creating a summer bias, however using monthly averaged data made our results less susceptible to the strong summer bias in the ava...

Extremes
This paper describes the methodology used by the team RedSea in the data competition organized fo... more This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel two-part model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differential equation (INLA-SPDE) approach. In the first part, a binary non-stationary spatio-temporal model is used to describe the underlying process that determines whether or not there is wildfire at a specific time and location. In the second part, we consider a non-stationary model that is based on log-Gaussian Cox processes for positive wildfire count data, and a non-stationary log-Gaussian model for positive burnt area data. Dependence between the positive count data and positive burnt area data is captured by a shared spatiotemporal random effect. Our two-part modeling approach performs well in terms of 1 the prediction score criterion chosen by the data competition organizers. Moreover, our model results show that surface pressure is the most influential driver for the occurrence of a wildfire, whilst surface net solar radiation and surface pressure are the key drivers for large numbers of wildfires, and temperature and evaporation are the key drivers of large burnt areas.

Revista Gaúcha de Enfermagem, 2021
Objective: to verify the association between the qualification of nursing professionals and the o... more Objective: to verify the association between the qualification of nursing professionals and the occurrence of adverse events in neonatal and pediatric intensive care units. Method: Cross-sectional and evaluation study conducted in six intensive care units of five public hospitals in the state of Paraná, Brazil. Data was collected from April/2017 to January/2018 through the use of a questionnaire to be completed by 143 nursing professionals and retrospective analysis of 79 medical records using the Neonatal Trigger Tool and Pediatric Trigger Tool instruments. The prognostic factors were professional training and the existence, or not, of a continuing education service; analysis was performed by logistic regression. Results: Detected 30 adverse events in 22 medical records analyzed. There was a prevalence of infection (n = 12; 40%) and skin damage (n = 9; 30%). Among the prognostic factors, continuing education was identified as a protective factor against adverse events (p≤0.05). Con...
Uploads
Papers by Elias T Krainski