Papers by Haim Sompolinsky
BMC Neuroscience, 2016
Neural circuits are notorious for the complexity of their organization. Part of this complexity i... more Neural circuits are notorious for the complexity of their organization. Part of this complexity is related to the number of different cell types that work together to encode stimuli. I will discuss theoretical results that point to functional advantages of splitting neural populations into subtypes, both in feedforward and recurrent networks. These results outline a framework for categorizing neuronal types based on their functional properties. Such classification scheme could augment classification schemes based on molecular, anatomical, and electrophysiological properties.
Selectivity and Sparseness in Randomly Connected Balanced Networks
PLoS ONE, 2014

Physical Review E, 2021
Many sensory pathways in the brain rely on sparsely active populations of neurons downstream from... more Many sensory pathways in the brain rely on sparsely active populations of neurons downstream from the input stimuli. The biological reason for the occurrence of expanded structure in the brain is unclear, but may be because expansion can increase the expressive power of a neural network. In this work, we show that expanding a neural network can improve its generalization performance even in cases in which the expanded structure is pruned after the learning period. To study this setting we use a teacher-student framework where a perceptron teacher network generates labels which are corrupted with small amounts of noise. We then train a student network that is structurally matched to the teacher and can achieve optimal accuracy if given the teachers synaptic weights. We find that sparse expansion of the input of a student perceptron network both increases its capacity and improves the generalization performance of the network when learning a noisy rule from a teacher perceptron when these expansions are pruned after learning. We find similar behavior when the expanded units are stochastic and uncorrelated with the input and analyze this network in the mean field limit. We show by solving the mean field equations that the generalization error of the stochastic expanded student network continues to drop as the size of the network increases. The improvement in generalization performance occurs despite the increased complexity of the student network relative to the teacher it is trying to learn. We show that this effect is closely related to the addition of slack variables in artificial neural networks and suggest possible implications for artificial and biological neural networks.

Neural Networks, 2020
We perform an average case analysis of the generalization dynamics of large neural networks train... more We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that naive application of worst-case theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.

The Journal of Neuroscience, 2003
The capability of feedforward networks composed of multiple layers of integrate-and-fire neurons ... more The capability of feedforward networks composed of multiple layers of integrate-and-fire neurons to transmit rate code was examined. Synaptic connections were made only from one layer to the next, and excitation was balanced by inhibition. When time is discrete and the synaptic potentials rise instantaneously, we show that, for random uncorrelated input to layer one, the mean rate of activity in deep layers is essentially independent of input firing rate. This implies that the input rate cannot be transmitted reliably in such feedforward networks because neurons in a given layer tend to synchronize partially with each other because of shared inputs. As a result of this synchronization, the average firing rate in deep layers will either decay to zero or reach a stable fixed point, depending on model parameters. When time is treated continuously and the synaptic potentials rise instantaneously, these effects develop slowly, and rate transmission over a limited number of layers is poss...
Method, device and system for speech recognition

Physical Review E, 1994
One of the main experimental tools in probing the interactions between neurons has been the measu... more One of the main experimental tools in probing the interactions between neurons has been the measurement of the correlations in their activity. In general, however the interpretation of the observed correlations is di cult, since the correlation between a pair of neurons is in uenced not only by the direct interaction between them but also by the dynamic state of the entire network to which they belong. Thus, a comparison between the observed correlations and the predictions from speci c model networks is needed. In this paper we develop the theory of neuronal correlation functions in large networks comprising of several highly connected subpopulations, and obeying stochastic dynamic rules. When the networks are in asynchronous states, the cross-correlations are relatively weak, i.e., their amplitude relative to that of the auto-correlations is of order of 1=N, N being the size of the interacting populations. Using the weakness of the cross-correlations, general equations which express the matrix of cross-correlations in terms of the mean neuronal activities, and the e ective interaction matrix are presented. The e ective interactions are the synaptic e cacies multiplied by the the gain of the postsynaptic neurons. The time-delayed crosscorrelation matrix can be expressed as a sum of exponentially decaying modes that correspond to the (non-orthogonal) eigenvectors of the e ective interaction matrix. The theory is extended to networks with random connectivity, such as randomly dilute networks. This allows for the comparison between the contribution from the internal common input and that from the direct interactions to the correlations of monosynaptically coupled pairs. A closely related quantity is the linear response of the neurons to external time-dependent perturbations. We derive the form of the dynamic linear response function of neurons in the above architecture, in terms of the eigenmodes of the e ective interaction matrix. The behavior of the correlations and the linear response when the system is near a bifurcation point is analyzed. Near a saddle-node bifurcation the correlation matrix is dominated by a single slowly decaying critical mode. Near a Hopf-bifurcation the correlations exhibit weakly damped sinusoidal oscillations. The general theory is applied to the case of randomly dilute network consisting of excitatory and inhibitory subpopulations, using parameters that mimic the local circuit of 1mm 3 of rat neocortex. Both the e ect of dilution as well as the in uence of a nearby bifurcation to an oscillatory states are demonstrated. networks. Hence, the interpretation of the observed features of the CCs remains an open challenge, and is the topic of this paper. Another potentially important probe of the interactions in a neuronal network is the linear response function, namely the change in the average ring rates due to a su ciently weak externally applied perturbation. Here again, the magnitude as well as the temporal evolution of the response depend on the state of the network. Hence, it is important to understand the properties of linear response functions in large networks. Most of the theoretical studies of neural network models consider only average ring rates, where the averaging is over time, over the stochastic noise, or over a population of neurons. In many highly connected networks these averages obey relatively simple mean-eld equations. However, to account for the uctuations about these averages, one must go beyond the mean-eld equations. In this work we develop the theory of the uctuations in the neuronal activities and their correlations, in large stochastic networks. We focus on network architectures that allow a mean eld description of their average activities. Speci cally, we assume that the network comprises of several large, homogeneous subpopulations, each consists of a signi cant fraction of the total number of neurons, N. Each neuron is coupled to order N neighbors, hence the individual synaptic e cacies are weak, i.e., of order 1=N. Other important restrictions of the present work are concerned with the dynamics. We assume that the network obeys stochastic dynamic equations and that it is in an asynchronous state. Under the above conditions we derive equations for the dynamic linear response and time-dependent correlation functions. These expressions reveal the relationship between the correlations and the linear response functions on one handside and the network connectivity and dynamical state on the other handside. These results are extended to the case of networks with randomness in the connections. We apply the general theory to a network composed of two subpopulations: excitatory neurons and inhibitory ones. We calculate the time-delayed autocorrelation (AC) and CC functions using parameters that represent the gross features of the local connectivity and the rest activity levels in the rat neocortex. The e ect of a proximity of a bifurcation to a synchronized oscillatory state, as well as the e ect of random dilution of the connections are elucidated. The outline of the paper is as follows: In Section 2 we de ne asynchronous and synchronous states in large neural networks, and discuss their implications. In Section 3, we de ne the stochastic dynamics of the networks, describe the neuronal correlation functions, and present some of their general properties. We then de ne the mean-eld architecture which will be assumed in most of this work. The mean-eld equations for the noise-or population-averaged activities are derived in

Nature neuroscience, 2003
The calculation and memory of position variables by temporal integration of velocity signals is e... more The calculation and memory of position variables by temporal integration of velocity signals is essential for posture, the vestibulo-ocular reflex (VOR) and navigation. Integrator neurons exhibit persistent firing at multiple rates, which represent the values of memorized position variables. A widespread hypothesis is that temporal integration is the outcome of reverberating feedback loops within recurrent networks, but this hypothesis has not been proven experimentally. Here we present a single-cell model of a neural integrator. The nonlinear dynamics of calcium gives rise to propagating calcium wave-fronts along dendritic processes. The wave-front velocity is modulated by synaptic inputs such that the front location covaries with the temporal sum of its previous inputs. Calcium-dependent currents convert this information into concomitant persistent firing. Calcium dynamics in single neurons could thus be the physiological basis of the graded persistent activity and temporal integr...

Neural Plasticity, 2007
The Israel Society for Neuroscience—ISFN—was founded in 1993 by a group of Israeli leading scient... more The Israel Society for Neuroscience—ISFN—was founded in 1993 by a group of Israeli leading scientists conducting research in the area of neurobiology. The primary goal of the society was to promote and disseminate the knowledge and understanding acquired by its members, and to strengthen interactions between them. Since then, the society holds its annual meeting every year in Eilat usually during December. At this annual meetings, the senior Israeli neurobiologists, their teams, and their graduate students, as well as foreign scientists and students, present their recent research findings in platform and poster presentations, and the program of the meeting is mainly based on the 338 received abstracts which are published in this volume. The meeting also offers the opportunity for the researchers to exchange information with each other, often leading to the initiation of collaborative studies. Both the number of members of the society and those participating in the annual meeting is ...

Benefits of Pathway Splitting in Sensory Coding
The Journal of Neuroscience, 2014
In many sensory systems, the neural signal splits into multiple parallel pathways. For example, i... more In many sensory systems, the neural signal splits into multiple parallel pathways. For example, in the mammalian retina, ∼20 types of retinal ganglion cells transmit information about the visual scene to the brain. The purpose of this profuse and early pathway splitting remains unknown. We examine a common instance of splitting into ON and OFF neurons excited by increments and decrements of light intensity in the visual scene, respectively. We test the hypothesis that pathway splitting enables more efficient encoding of sensory stimuli. Specifically, we compare a model system with an ON and an OFF neuron to one with two ON neurons. Surprisingly, the optimal ON–OFF system transmits the same information as the optimal ON–ON system, if one constrains the maximal firing rate of the neurons. However, the ON–OFF system uses fewer spikes on average to transmit this information. This superiority of the ON–OFF system is also observed when the two systems are optimized while constraining thei...

Physical Review E, 2002
In many biological systems, the electrical coupling of nonoscillating cells generates synchronize... more In many biological systems, the electrical coupling of nonoscillating cells generates synchronized membrane potential oscillations. This work describes a dynamical mechanism in which the electrical coupling of identical nonoscillating cells destabilizes the homogeneous fixed point and leads to network oscillations via a Hopf bifurcation. Each cell is described by a passive membrane potential and additional internal variables. The dynamics of the internal variables, in isolation, is oscillatory, but their interaction with the membrane potential damps the oscillations and therefore constructs nonoscillatory cells. The electrical coupling reveals the oscillatory nature of the internal variables and generates network oscillations. This mechanism is analyzed near the bifurcation point, where the spatial structure of the membrane potential oscillations is determined by the network architecture and in the limit of strong coupling, where the membrane potentials of all cells oscillate in-phase and multiple cluster states dominate the dynamics. In particular, we have derived an asymptotic behavior for the spatial fluctuations in the limit of strong coupling in fully connected networks and in a one-dimensional lattice architecture.

Proceedings of the fifth annual workshop on Computational learning theory, 1992
We propose an algorithm called query by committee, in which a committee of students is trained on... more We propose an algorithm called query by committee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the high-low game and perception learning of another perception. As the number of queries goes to infinity, the committee algorithm yields asymptotically finite information gain. This leads to generalization error that decreases exponentially with the number of examples. This in marked contrast to learning from randomly chosen inputs, for which the information gain approaches zero and the generalization error decreases with a relatively slow inverse power law. We suggest that asymptotically finite information gain may be an important characteristic of good query algorithms.

Information Tuning of Populations of Neurons in Primary Visual Cortex
The Journal of Neuroscience, 2004
Neurons in macaque primary visual cortex (V1) show a diversity of orientation tuning properties, ... more Neurons in macaque primary visual cortex (V1) show a diversity of orientation tuning properties, exhibiting a broad distribution of tuning width, baseline activity, peak response, and circular variance (CV). Here, we studied how the different tuning features affect the performance of these cells in discriminating between stimuli with different orientations. Previous studies of the orientation discrimination power of neurons in V1 focused on resolving two nearby orientations close to the psychophysical threshold of orientation discrimination. Here, we developed a theoretical framework, the information tuning curve, that measures the discrimination power of cells as a function of the orientation difference, δθ, of the two stimuli. This tuning curve also represents the mutual information between the neuronal responses and the stimulus orientation. We studied theoretically the dependence of the information tuning curve on the orientation tuning width, baseline, and peak responses. Of ma...

Adaptation without parameter change: Dynamic gain control in motion detection
Proceedings of the National Academy of Sciences, 2005
Many sensory systems adapt their input-output relationship to changes in the statistics of the am... more Many sensory systems adapt their input-output relationship to changes in the statistics of the ambient stimulus. Such adaptive behavior has been measured in a motion detection sensitive neuron of the fly visual system, H1. The rapid adaptation of the velocity response gain has been interpreted as evidence of optimal matching of the H1 response to the dynamic range of the stimulus, thereby maximizing its information transmission. Here, we show that correlation-type motion detectors, which are commonly thought to underlie fly motion vision, intrinsically possess adaptive properties. Increasing the amplitude of the velocity fluctuations leads to a decrease of the effective gain and the time constant of the velocity response without any change in the parameters of these detectors. The seemingly complex property of this adaptation turns out to be a straightforward consequence of the multidimensionality of the stimulus and the nonlinear nature of the system.

Bayesian model of dynamic image stabilization in the visual system
Proceedings of the National Academy of Sciences, 2010
Humans can resolve the fine details of visual stimuli although the image projected on the retina ... more Humans can resolve the fine details of visual stimuli although the image projected on the retina is constantly drifting relative to the photoreceptor array. Here we demonstrate that the brain must take this drift into account when performing high acuity visual tasks. Further, we propose a decoding strategy for interpreting the spikes emitted by the retina, which takes into account the ambiguity caused by retinal noise and the unknown trajectory of the projected image on the retina. A main difficulty, addressed in our proposal, is the exponentially large number of possible stimuli, which renders the ideal Bayesian solution to the problem computationally intractable. In contrast, the strategy that we propose suggests a realistic implementation in the visual cortex. The implementation involves two populations of cells, one that tracks the position of the image and another that represents a stabilized estimate of the image itself. Spikes from the retina are dynamically routed to the two...

PLoS Biology, 2007
Humans can distinguish visual stimuli that differ by features the size of only a few photorecepto... more Humans can distinguish visual stimuli that differ by features the size of only a few photoreceptors. This is possible despite the incessant image motion due to fixational eye movements, which can be many times larger than the features to be distinguished. To perform well, the brain must identify the retinal firing patterns induced by the stimulus while discounting similar patterns caused by spontaneous retinal activity. This is a challenge since the trajectory of the eye movements, and consequently, the stimulus position, are unknown. We derive a decision rule for using retinal spike trains to discriminate between two stimuli, given that their retinal image moves with an unknown random walk trajectory. This algorithm dynamically estimates the probability of the stimulus at different retinal locations, and uses this to modulate the influence of retinal spikes acquired later. Applied to a simple orientationdiscrimination task, the algorithm performance is consistent with human acuity, whereas naive strategies that neglect eye movements perform much worse. We then show how a simple, biologically plausible neural network could implement this algorithm using a local, activity-dependent gain and lateral interactions approximately matched to the statistics of eye movements. Finally, we discuss evidence that such a network could be operating in the primary visual cortex.
Physical Review Letters, 2010
We study the computational capacity of a model neuron, the Tempotron, which classifies sequences ... more We study the computational capacity of a model neuron, the Tempotron, which classifies sequences of spikes by linear-threshold operations. We use statistical mechanics and extreme value theory to derive the capacity of the system in random classification tasks. In contrast to its static analog, the Perceptron, the Tempotron's solutions space consists of a large number of small clusters of weight vectors. The capacity of the system per synapse is finite in the large size limit and weakly diverges with the stimulus duration relative to the membrane and synaptic time constants.
Short-Term Memory in Orthogonal Neural Networks
Physical Review Letters, 2004
Physical Review Letters, 1986
We present SQUID measurements of the nonlinear susceptibility X"~of "pure" CttMn and CuMn doped w... more We present SQUID measurements of the nonlinear susceptibility X"~of "pure" CttMn and CuMn doped with gold impurities with concentrations cA"of 1, 2, and 3 at. /o. Gold impurities suppress the magnitude of X"l. Ordinary one-parameter scaling with respect to field and temperature is obeyed by~"I of the pure CuMn. However, strong deviations from this scaling behavior are observed in the gold-doped samples in the vicinity of the freezing temperature and in low fields, 0 & 0' where 0' increases with c. This is interpreted as the onset of an anisotropy-induced crossover from a Heisenberg to an Ising spin-glass critical behavior.
Uploads
Papers by Haim Sompolinsky