Papers by Christopher Soelistyo
Learning the organizational principles of biological systems using AI

arXiv (Cornell University), Feb 5, 2024
How can we find interpretable, domain-appropriate models of natural phenomena given some complex,... more How can we find interpretable, domain-appropriate models of natural phenomena given some complex, raw data such as images? Can we use such models to derive scientific insight from the data? In this paper, we propose some methods for achieving this. In particular, we implement disentangled representation learning, sparse deep neural network training and symbolic regression, and assess their usefulness in forming interpretable models of complex image data. We demonstrate their relevance to the field of bioimaging using a well-studied test problem of classifying cell states in microscopy data. We find that such methods can produce highly parsimonious models that achieve ∼ 98% of the accuracy of black-box benchmark models, with a tiny fraction of the complexity. We explore the utility of such interpretable models in producing scientific explanations of the underlying biological phenomenon. All models are wrong but some are useful.

bioRxiv (Cold Spring Harbor Laboratory), Jul 17, 2023
Explainable deep learning holds significant promise in extracting scientific insights from experi... more Explainable deep learning holds significant promise in extracting scientific insights from experimental observations. This is especially so in the field of bio-imaging, where the raw data is often voluminous, yet extremely variable and difficult to study. However, one persistent challenge in deep learning assisted scientific discovery is that the workings of artificial neural networks are often difficult to interpret. Here we present a simple technique for investigating the behaviour of trained neural networks: virtual perturbation. By making precise and systematic alterations to input data or internal representations thereof, we are able to discover causal relationships in the outputs of a deep learning model, and by extension, in the underlying phenomenon itself. As an exemplar, we use our recently described deep-learning based cell fate prediction model. We trained the network to predict the fate of less fit cells in an experimental model of mechanical cell competition. By applying virtual perturbation to the trained network, we discover causal relationships between a cell's environment and eventual fate. We compare these with known properties of the biological system under investigation to demonstrate that the model faithfully captures insights previously established by experimental research.

Frontiers in Bioinformatics
Quantifying cell biology in space and time requires computational methods to detect cells, measur... more Quantifying cell biology in space and time requires computational methods to detect cells, measure their properties, and assemble these into meaningful trajectories. In this aspect, machine learning (ML) is having a transformational effect on bioimage analysis, now enabling robust cell detection in multidimensional image data. However, the task of cell tracking, or constructing accurate multi-generational lineages from imaging data, remains an open challenge. Most cell tracking algorithms are largely based on our prior knowledge of cell behaviors, and as such, are difficult to generalize to new and unseen cell types or datasets. Here, we propose that ML provides the framework to learn aspects of cell behavior using cell tracking as the task to be learned. We suggest that advances in representation learning, cell tracking datasets, metrics, and methods for constructing and evaluating tracking solutions can all form part of an end-to-end ML-enhanced pipeline. These developments will l...
Time-based comparisons of single-cell trajectories are challenging due to their intrinsic heterog... more Time-based comparisons of single-cell trajectories are challenging due to their intrinsic heterogeneity, autonomous decisions, dynamic transitions and unequal lengths. In this paper, we present a self-supervised framework combining an image autoencoder with dynamic time series analysis of latent feature space to represent, compare and annotate cell cycle phases across singlecell trajectories. In our fully data-driven approach, we map similarities between heterogeneous cell tracks and generate statistical representations of single-cell trajectory phase durations, onset and transitions. This work is a first effort to transform a sequence of learned image representations from cell cycle-specific reporters into an unsupervised sequence annotation.

International Journal of Environmental Research and Public Health
Noise annoyance has been often reported as one of the main adverse effects of noise exposure on h... more Noise annoyance has been often reported as one of the main adverse effects of noise exposure on human health, and there is consensus that it relates to several factors going beyond the mere energy content of the signal. Research has historically focused on a limited set of sound sources (e.g., transport and industrial noise); only more recently is attention being given to more holistic aspects of urban acoustic environments and the role they play in the noise annoyance perceptual construct. This is the main approach promoted in soundscape studies, looking at both wanted and unwanted sounds. In this study, three specific aspects were investigated, namely: (1) the effect of different sound sources combinations, (2) the number of sound sources present in the soundscape, and (3) the presence of individual sound source, on noise annoyance perception. For this purpose, a large-scale online experiment was carried out with 1.2k+ participants, using 2.8k+ audio recordings of complex urban ac...
Convolutional Neural Networks for Classifying Chromatin Morphology in Live-Cell Imaging
Methods in molecular biology, 2022

Explainable deep learning holds significant promise in extracting scientific insights from experi... more Explainable deep learning holds significant promise in extracting scientific insights from experimental observations. This is especially so in the field of bio-imaging, where the raw data is often voluminous, yet extremely variable and difficult to study. However, one persistent challenge in deep learning assisted scientific discovery is that the workings of artificial neural networks are often difficult to interpret. Here we present a simple technique for investigating the behaviour of trained neural networks: virtual perturbation. By making precise and systematic alterations to input data or internal representations thereof, we are able to discover causal relationships in the outputs of a deep learning model, and by extension, in the underlying phenomenon itself. As an exemplar, we use our recently described deep-learning based cell fate prediction model. We trained the network to predict the fate of less fit cells in an experimental model of mechanical cell competition. By applyi...

Deep learning techniques for noise annoyance detection: Results from an intensive workshop at the Alan Turing Institute
The Journal of the Acoustical Society of America
Advancements in AI and ML have enabled us to combine automated sound source recognition and deep ... more Advancements in AI and ML have enabled us to combine automated sound source recognition and deep learning models for predicting subjective soundscape perception. We held a multidisciplinary, cross-institutional Data Study Group (DSG) to investigate how sound source information could be incorporated into deep learning models for predicting urban noise annoyance. We used a large-scale dataset of 2980 15-s recordings paired with 12 210 annoyance ratings (from 1 to 10) and sound source labels. A total of 14 neural networks and 4 conventional ML models were built. The best model was trained to simultaneously predict sound source labels and annoyance rating. It achieved an RMSE = 1.07 for annoyance prediction and AUROC = 0.88 for label classification, while a similarly structured model trained to predict annoyance ratings only (i.e., no sound source information) achieved RMSE = 1.13. Results showed that including sound source labels as a simultaneous training output, rather than as an exp...
Learning biophysical determinants of cell fate with deep neural networks
Nature Machine Intelligence

Deep learning is now a powerful tool in microscopy data analysis, and is routinely used for image... more Deep learning is now a powerful tool in microscopy data analysis, and is routinely used for image processing applications such as segmentation and denoising. However, it has rarely been used to directly learn mechanistic models of a biological system, owing to the complexity of the internal representations. Here, we develop an end-to-end machine learning model capable of learning the rules of a complex biological phenomenon, cell competition, directly from a large corpus of time-lapse microscopy data. Cell competition is a quality control mechanism that eliminates unfit cells from a tissue and during which cell fate is thought to be determined by the local cellular neighborhood over time. To investigate this, we developed a new approach (τ-VAE) by coupling a probabilistic encoder to a temporal convolution network to predict the fate of each cell in an epithelium. Using the τ-VAE’s latent representation of the local tissue organization and the flow of information in the network, we d...
Uploads
Papers by Christopher Soelistyo