Papers by Kristofer Bouchard

PLOS computational biology/PLoS computational biology, Apr 26, 2024
The brain produces diverse functions, from perceiving sounds to producing arm reaches, through th... more The brain produces diverse functions, from perceiving sounds to producing arm reaches, through the collective activity of populations of many neurons. Determining if and how the features of these exogenous variables (e.g., sound frequency, reach angle) are reflected in population neural activity is important for understanding how the brain operates. Often, high-dimensional neural population activity is confined to low-dimensional latent spaces. However, many current methods fail to extract latent spaces that are clearly structured by exogenous variables. This has contributed to a debate about whether or not brains should be thought of as dynamical systems or representational systems. Here, we developed a new latent process Bayesian regression framework, the orthogonal stochastic linear mixing model (OSLMM) which introduces an orthogonality constraint amongst time-varying mixture coefficients, and provide Markov chain Monte Carlo inference procedures. We demonstrate superior performance of OSLMM on latent trajectory recovery in synthetic experiments and show superior computational efficiency and prediction performance on several real-world benchmark data sets. We primarily focus on demonstrating the utility of OSLMM in two neural data sets: μECoG recordings from rat auditory cortex during presentation of pure tones and multi-single unit recordings form monkey motor cortex during complex arm reaching. We show that OSLMM achieves superior or comparable predictive accuracy of neural data and decoding of external variables (e.g., reach velocity). Most importantly, in both experimental contexts, we demonstrate that OSLMM latent trajectories directly reflect features of the sounds and reaches, demonstrating that neural dynamics are structured by neural representations. Together, these results demonstrate that OSLMM will be useful for the analysis of diverse, large-scale biological time-series datasets.

Research Square (Research Square), Mar 29, 2024
Brain computations emerge from collective dynamics of distinct neural populations. Behaviors incl... more Brain computations emerge from collective dynamics of distinct neural populations. Behaviors including reaching and speech are explained by principles of feedback control. However, if feedback control explains neural population dynamics is unknown. We created dimensionality reduction methods that identify subspaces of neural population data that are most feed-forward controllable (FFC) vs. feedback controllable (FBC). We showed that FBC and FFC subspaces diverge for dynamics generated by neuroanatomical connectivity. In neural recordings from monkey M1/S1 during reaching, FBC subspaces were better decoders of reach kinematics. Compared to FFC subspaces, FBC subspaces emerged from collective interactions of a population of neurons with distinct activity profiles. Finally, we revealed that FBC subspaces emphasize rotational dynamics due to enhanced system stability, while FFC subspaces emphasize scaling dynamics. These results demonstrate feedback controllability is a novel, normative theory of neural population dynamics, and connect distinct neuronal populations to differing regimes of emergent dynamics carrying out distinct computations.

Studies in classification, data analysis, and knowledge organization, Dec 31, 2022
This paper presents an efficient variational inference framework for a family of structured Gauss... more This paper presents an efficient variational inference framework for a family of structured Gaussian process regression network (SGPRN) models. We incorporate auxiliary inducing variables in latent functions and jointly treat both the distributions of the inducing variables and hyper-parameters as variational parameters. Then we take advantage of the collapsed representation of the model and propose structured variational distributions, which enables the decomposability of a tractable variational lower bound and leads to stochastic optimization. Our inference approach is able to model data in which outputs do not share a common input set, and with a computational complexity independent of the size of the inputs and outputs to easily handle datasets with missing values. Finally, we illustrate our approach on both synthetic and real data.

Scientific Reports, Nov 30, 2023
the TRACK-TBI Investigators * Traumatic brain injury (TBI) affects how the brain functions in the... more the TRACK-TBI Investigators * Traumatic brain injury (TBI) affects how the brain functions in the short and long term. Resulting patient outcomes across physical, cognitive, and psychological domains are complex and often difficult to predict. Major challenges to developing personalized treatment for TBI include distilling large quantities of complex data and increasing the precision with which patient outcome prediction (prognoses) can be rendered. We developed and applied interpretable machine learning methods to TBI patient data. We show that complex data describing TBI patients' intake characteristics and outcome phenotypes can be distilled to smaller sets of clinically interpretable latent factors. We demonstrate that 19 clusters of TBI outcomes can be predicted from intake data, a ~ 6× improvement in precision over clinical standards. Finally, we show that 36% of the outcome variance across patients can be predicted. These results demonstrate the importance of interpretable machine learning applied to deeply characterized patients for data-driven distillation and precision prognosis. The collection of ever larger and more detailed biomedical datasets brings with it the promise of personalized treatments and interventions for a diversity of diseases and disorders 1 . Extraction of clinically interpretable insights from such large, complex datasets is challenging and creates an impediment to better understanding and hence treatment. Current medical frameworks typically group patients with a given condition into a small number of classes, obfuscating the individual nature of their biology and ailments 2 . A critical first step towards personalized treatments is to increase the precision with which we describe the patient and their outcomes, and predict those outcomes from socioeconomic, demographic, biomarker, and medical variables from initial clinical presentation, that we refer to as "intake" data 2,3 . Here, we addressed this gap by developing and applying interpretable machine learning techniques for data distillation and precision prognoses in the context of traumatic brain injury (TBI). Traumatic brain injury is damage to the brain resulting from any external force or object. According to 2020 estimates, 2.8 million people sustain a TBI annually in the United States (US), of which 64,000 die, 223,000 are hospitalized, and 2.5 million (~ 90%) are treated and released from an emergency department 4 . TBI is a contributing factor to one-third of all injury-related deaths in the US and has complex relationships with polytrauma 5 . Direct medical costs and indirect costs of TBI, such as lost productivity, cost the world economy ~ $400 billion

Journal of Neuroscience Methods, Dec 1, 2015
To dissect the intricate workings of neural circuits, it is essential to gain precise control ove... more To dissect the intricate workings of neural circuits, it is essential to gain precise control over subsets of neurons while retaining the ability to monitor larger-scale circuit dynamics. This requires the ability to both evoke and record neural activity simultaneously with high spatial and temporal resolution. In this paper we present approaches that address this need by combining microelectrocorticography ( ECoG) with optogenetics in ways that avoid photovoltaic artifacts. We demonstrate that variations of this approach are broadly applicable across three commonly studied mammalian species-mouse, rat, and macaque monkey-and that the recorded ECoG signal shows complex spectral and spatio-temporal patterns in response to optical stimulation. While optogenetics provides the ability to excite or inhibit neural subpopulations in a targeted fashion, large-scale recording of resulting neural activity remains challenging. Recent advances in optical physiology, such as genetically encoded Ca 2+ indicators, are promising but currently do not allow simultaneous recordings from extended cortical areas due to limitations in optical imaging hardware. We demonstrate techniques for the large-scale simultaneous interrogation of cortical circuits in three commonly used mammalian species.

NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUs
Journal of Neuroscience Methods, 2022
BACKGROUND The membrane potential of individual neurons depends on a large number of interacting ... more BACKGROUND The membrane potential of individual neurons depends on a large number of interacting biophysical processes operating on spatial-temporal scales spanning several orders of magnitude. The multi-scale nature of these processes dictates that accurate prediction of membrane potentials in specific neurons requires the utilization of detailed simulations. Unfortunately, constraining parameters within biologically detailed neuron models can be difficult, leading to poor model fits. This obstacle can be overcome partially by numerical optimization or detailed exploration of parameter space. However, these processes, which currently rely on central processing unit (CPU) computation, often incur orders of magnitude increases in computing time for marginal improvements in model behavior. As a result, model quality is often compromised to accommodate compute resources. NEW METHOD Here, we present a simulation environment, NeuroGPU, that takes advantage of the inherent parallelized structure of the graphics processing unit (GPU) to accelerate neuronal simulation. RESULTS & COMPARISON WITH EXISTING METHODS NeuroGPU can simulate most biologically detailed models 10-200 times faster than NEURON simulation running on a single core and 5 times faster than GPU simulators (CoreNEURON). NeuroGPU is designed for model parameter tuning and best performs when the GPU is fully utilized by running multiple (>100) instances of the same model with different parameters. When using multiple GPUs, NeuroGPU can reach to a speed-up of 800 fold compared to single core simulations, especially when simulating the same model morphology with different parameters. We demonstrate the power of NeuoGPU through large-scale parameter exploration to reveal the response landscape of a neuron. Finally, we accelerate numerical optimization of biophysically detailed neuron models to achieve highly accurate fitting of models to simulation and experimental data. CONCLUSIONS Thus, NeuroGPU is the fastest available platform that enables rapid simulation of multi-compartment, biophysically detailed neuron models on commonly used computing systems accessible by many scientists.

arXiv (Cornell University), Mar 23, 2020
Despite the fact that the loss functions of deep neural networks are highly non-convex, gradientb... more Despite the fact that the loss functions of deep neural networks are highly non-convex, gradientbased optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstrating that neural network losses enjoy a no-bad-local-minima property and an abundance of saddle points. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.

Improved inference in coupling, encoding, and decoding models and its consequence for neuroscientific interpretation
Journal of Neuroscience Methods, Jul 1, 2021
BACKGROUND A central goal of systems neuroscience is to understand the relationships amongst cons... more BACKGROUND A central goal of systems neuroscience is to understand the relationships amongst constituent units in neural populations, and their modulation by external factors, using high-dimensional and stochastic neural recordings. Parametric statistical models (e.g., coupling, encoding, and decoding models), play an instrumental role in accomplishing this goal. However, extracting conclusions from a parametric model requires that it is fit using an inference algorithm capable of selecting the correct parameters and properly estimating their values. Traditional approaches to parameter inference have been shown to suffer from failures in both selection and estimation. The recent development of algorithms that ameliorate these deficiencies raises the question of whether past work relying on such inference procedures have produced inaccurate systems neuroscience models, thereby impairing their interpretation. NEW METHOD We used algorithms based on Union of Intersections, a statistical inference framework based on stability principles, capable of improved selection and estimation. COMPARISON We fit functional coupling, encoding, and decoding models across a battery of neural datasets using both UoI and baseline inference procedures (e.g., 1-penalized GLMs), and compared the structure of their fitted parameters. RESULTS Across recording modality, brain region, and task, we found that UoI inferred models with increased sparsity, improved stability, and qualitatively different parameter distributions, while maintaining predictive performance. We obtained highly sparse functional coupling networks with substantially different community structure, more parsimonious encoding models, and decoding models that relied on fewer single-units. CONCLUSIONS Together, these results demonstrate that improved parameter inference, achieved via UoI, reshapes interpretation in diverse neuroscience contexts.

arXiv (Cornell University), Mar 3, 2022
Unsupervised learning plays an important role in many fields, such as artificial intelligence, ma... more Unsupervised learning plays an important role in many fields, such as artificial intelligence, machine learning, and neuroscience. Compared to static data, methods for extracting lowdimensional structure for dynamic data are lagging. We developed a novel information-theoretic framework, Compressed Predictive Information Coding (CPIC), to extract useful representations from dynamic data. CPIC selectively projects the past (input) into a linear subspace that is predictive about the compressed data projected from the future (output). The key insight of our framework is to learn representations by minimizing the compression complexity and maximizing the predictive information in latent space. We derive variational bounds of the CPIC loss which induces the latent space to capture information that is maximally predictive. Our variational bounds are tractable by leveraging bounds of mutual information. We find that introducing stochasticity in the encoder robustly contributes to better representation. Furthermore, variational approaches perform better in mutual information estimation compared with estimates under a Gaussian assumption. We demonstrate that CPIC is able to recover the latent space of noisy dynamical systems with low signal-to-noise ratios, and extracts features predictive of exogenous variables in neuroscience data.

Sparse, Predictive, and Interpretable Functional Connectomics with UoILasso
Network formation from neural activity is a foundational problem in systems neuroscience. Functio... more Network formation from neural activity is a foundational problem in systems neuroscience. Functional networks, after downstream analysis, can provide key insights into the nature of neurobiological structure and computation. The validity of such insights hinges on accurate selection and estimation of the edges connecting nodes. However, commonly used statistical inference procedures generally fail to identify the correct features, and further introduce consequential bias in the estimates. To address these issues, we developed Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced statistical feature selection and estimation. Methods based on UoI perform feature selection and feature estimation through intersection and union operations, respectively. In the context of linear regression (specifically UoILasso), we summarize extensive numerical investigation on synthetic data to demonstrate tight control of false-positives and false-negatives in feature selection with low-bias and low-variance estimates of selected parameters, while maintaining high-quality prediction accuracy. We demonstrate, with UoILasso, the extraction of sparse, predictive, and interpretable functional networks from human electrocorticography recordings during speech production and the inference of parsimonious coupling models from nonhuman primate single-unit recordings during reaching tasks. Our results establish that UoILasso generates interpretable and predictive functional connectivity networks.

The concept of sparsity has proven useful to understanding elementary neural computations in sens... more The concept of sparsity has proven useful to understanding elementary neural computations in sensory systems. However, the role of sparsity in motor regions is poorly understood. Here, we investigated the functional properties of sparse structure in neural activity collected with highdensity electrocorticography (ECoG) from speech sensorimotor cortex (vSMC) in neurosurgical patients. Using independent components analysis (ICA), we found individual components corresponding to individual major oral articulators (i.e., Coronal Tongue, Dorsal Tongue, Lips), which were selectively activated during utterances that engaged that articulator on single trials. Some of the components corresponded to spatially sparse activations. Components with similar properties were also extracted using convolutional sparse coding (CSC), and required less data pre-processing. Finally, individual utterances could be accurately decoded from vSMC ECoG recordings using linear classifiers trained on the high-dimensional sparse codes generated by CSC. Together, these results suggest that sparse coding may be an important framework and tool for understanding sensory-motor activity generating complex behaviors, and may be useful for brain-machine interfaces.

PLOS ONE, Dec 13, 2019
Studying the biology of sleep requires the accurate assessment of the state of experimental subje... more Studying the biology of sleep requires the accurate assessment of the state of experimental subjects, and manual analysis of relevant data is a major bottleneck. Recently, deep learning applied to electroencephalogram and electromyogram data has shown great promise as a sleep scoring method, approaching the limits of inter-rater reliability. As with any machine learning algorithm, the inputs to a sleep scoring classifier are typically standardized in order to remove distributional shift caused by variability in the signal collection process. However, in scientific data, experimental manipulations introduce variability that should not be removed. For example, in sleep scoring, the fraction of time spent in each arousal state can vary between control and experimental subjects. We introduce a standardization method, mixture z-scoring, that preserves this crucial form of distributional shift. Using both a simulated experiment and mouse in vivo data, we demonstrate that a common standardization method used by state-of-the-art sleep scoring algorithms introduces systematic bias, but that mixture z-scoring does not. We present a free, open-source user interface that uses a compact neural network and mixture z-scoring to allow for rapid sleep scoring with accuracy that compares well to contemporary methods. This work provides a set of computational tools for the robust automation of sleep scoring.

2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Nov 1, 2021
Measuring electrical potentials in the extracellular space of the brain is a popular technique be... more Measuring electrical potentials in the extracellular space of the brain is a popular technique because it can detect action potentials from putative individual neurons. Electrophysiology is undergoing a transformation where the number of recording channels, and thus number of neurons detected, is growing at a dramatic rate. This rapid scaling is paving the way for both new discoveries and commercial applications; however, as the number of channels increases there will be an increasing need to make these systems more power efficient. One area ripe for optimization are the signal acquisition specifications needed to detect and sort action potentials (i.e., "spikes") to putative single neuron sources. In this work, we take existing recordings collected using Intan hardware and modify them in a way that corresponds to reduced recording performance. The accuracy of these degraded recordings to spike sort using MountainSort4 is evaluated by comparing against expert labels. We show that despite reducing signal specifications by a factor of 2 or more, spike sorting accuracy does not change substantially. Specifically, reducing both sample rate and bit depth from 30 kHz and 16 bits to 12 kHz and 12 bits resulted in a 3% drop in spike sorting accuracy. Our results suggest that current neural acquisition systems are over-specified. These results may inform the design of next generation neural acquisition systems enabling higher channel count systems.

Scientific Data
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were pr... more A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the

eLife, Oct 4, 2022
The neurophysiology of cells and tissues are monitored electrophysiologically and optically in di... more The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.

ABSTRACTBigNeuron is an open community bench-testing platform combining the expertise of neurosci... more ABSTRACTBigNeuron is an open community bench-testing platform combining the expertise of neuroscientists and computer scientists toward the goal of setting open standards for accurate and fast automatic neuron reconstruction. The project gathered a diverse set of image volumes across several species representative of the data obtained in most neuroscience laboratories interested in neuron reconstruction. Here we report generated gold standard manual annotations for a selected subset of the available imaging datasets and quantified reconstruction quality for 35 automatic reconstruction algorithms. Together with image quality features, the data were pooled in an interactive web application that allows users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and reconstruction data, and benchmarking of automatic reconstruction algorithms in user-defined data subsets. Our results show ...
arXiv (Cornell University), Oct 14, 2022
Self-driving labs (SDLs) combine fully automated experiments with artificial intelligence (AI) th... more Self-driving labs (SDLs) combine fully automated experiments with artificial intelligence (AI) that decides the next set of experiments. Taken to their ultimate expression, SDLs could usher a new paradigm of scientific research, where the world is probed, interpreted, and explained by machines for human benefit. While there are functioning SDLs in the fields of chemistry and materials science, we contend that synthetic biology provides a unique opportunity since the genome provides a single target for affecting the incredibly wide repertoire of biological cell behavior. However, the level of investment required for the creation of biological SDLs is only warranted if directed towards solving difficult and enabling biological questions. Here, we discuss challenges and opportunities in creating SDLs for synthetic biology.

arXiv (Cornell University), Mar 23, 2021
Sparse regression is frequently employed in diverse scientific settings as a feature selection me... more Sparse regression is frequently employed in diverse scientific settings as a feature selection method. A pervasive aspect of scientific data that hampers both feature selection and estimation is the presence of strong correlations between predictive features. These fundamental issues are often not appreciated by practitioners, and jeapordize conclusions drawn from estimated models. On the other hand, theoretical results on sparsity-inducing regularized regression such as the Lasso have largely addressed conditions for selection consistency via asymptotics, and disregard the problem of model selection, whereby regularization parameters are chosen. In this numerical study, we address these issues through exhaustive characterization of the performance of several regression estimators, coupled with a range of model selection strategies. These estimators and selection criteria were examined across correlated regression problems with varying degrees of signal to noise, distribution of the non-zero model coefficients, and model sparsity. Our results reveal a fundamental tradeoff between false positive and false negative control in all regression estimators and model selection criteria examined. Additionally, we are able to numerically explore a transition point modulated by the signal-to-noise ratio and spectral properties of the design covariance matrix at which the selection accuracy of all considered algorithms degrades. Overall, we find that SCAD coupled with BIC or empirical Bayes model selection performs the best feature selection across the regression problems considered.

bioRxiv (Cold Spring Harbor Laboratory), Oct 21, 2019
Studying the biology of sleep requires the accurate assessment of the state of experimental subje... more Studying the biology of sleep requires the accurate assessment of the state of experimental subjects, and manual analysis of relevant data is a major bottleneck. Recently, deep learning applied to electroencephalogram and electromyogram data has shown great promise as a sleep scoring method, approaching the limits of inter-rater reliability. As with any machine learning algorithm, the inputs to a sleep scoring classifier are typically standardized in order to remove distributional shift caused by variability in the signal collection process. However, in scientific data, experimental manipulations introduce variability that should not be removed. For example, in sleep scoring, the fraction of time spent in each arousal state can vary between control and experimental subjects. We introduce a standardization method, mixture z-scoring, that preserves this crucial form of distributional shift. Using both a simulated experiment and mouse in vivo data, we demonstrate that a common standardization method used by state-of-the-art sleep scoring algorithms introduces systematic bias, but that mixture z-scoring does not. We present a free, open-source user interface that uses a compact neural network and mixture z-scoring to allow for rapid sleep scoring with accuracy that compares well to contemporary methods. This work provides a set of computational tools for the robust automation of sleep scoring.
bioRxiv (Cold Spring Harbor Laboratory), Jul 12, 2023
In the brain, all neurons are driven by the activity of other neurons, some of which maybe simult... more In the brain, all neurons are driven by the activity of other neurons, some of which maybe simultaneously recorded, but most are not. As such, models of neuronal activity need to account for simultaneously recorded neurons and the influences of unmeasured neurons. This can be done through inclusion of model terms for observed external variables (e.g., tuning to stimuli) as well as terms for latent sources of variability. Determining the influence of groups of neurons on each .
Uploads
Papers by Kristofer Bouchard