The concept of free energy has its origins in 19th century thermodynamics, but has recently found... more The concept of free energy has its origins in 19th century thermodynamics, but has recently found its way into the behavioral and neural sciences, where it has been promoted for its wide applicability and has even been suggested as a fundamental principle of understanding intelligent behavior and brain function. We argue that there are essentially two different notions of free energy in current models of intelligent agency, that can both be considered as applications of Bayesian inference to the problem of action selection: one that appears when trading off accuracy and uncertainty based on a general maximum entropy principle, and one that formulates action selection in terms of minimizing an error measure that quantifies deviations of beliefs and policies from given reference models. The first approach provides a normative rule for action selection in the face of model uncertainty or when information processing capabilities are limited. The second approach directly aims to formulate the action selection problem as an inference problem in the context of Bayesian brain theories, also known as Active Inference in the literature. We elucidate the main ideas and discuss critical technical and conceptual issues revolving around these two notions of free energy that both claim to apply at all levels of decision-making, from the high-level deliberation of reasoning down to the low-level information processing of perception.
Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limi... more Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limited information-processing capabilities are formalized by information constraints. One of the consequences of bounded rationality is that resource-limited decision-makers can join together to solve decision-making problems that are beyond the capabilities of each individual. Here, we study an information-theoretic principle that drives division of labor and specialization when decision-makers with information constraints are joined together. We devise an on-line learning rule of this principle that learns a partitioning of the problem space such that it can be solved by specialized linear policies. We demonstrate the approach for decision-making problems whose complexity exceeds the capabilities of individual decision-makers, but can be solved by combining the decision-makers optimally. The strength of the model is that it is abstract and principled, yet has direct applications in classification, regression, reinforcement learning and adaptive control.
Importantly, the first two models are reward-based, the last two approaches are based on the assu... more Importantly, the first two models are reward-based, the last two approaches are based on the assumption of decision-makers having internal models. All models are able to solve the NeuronGame, but exhibit different learning dynamics on the way. Although the parameters of all models were fitted such that model performance is as close as possible to human performance, there is a significant difference in fitting quality when trying to predict
Bounded rationality investigates utility-optimizing decisionmakers with limited information-proce... more Bounded rationality investigates utility-optimizing decisionmakers with limited information-processing power. In particular, information theoretic bounded rationality models formalize resource constraints abstractly in terms of relative Shannon information, namely the Kullback-Leibler Divergence between the agents' prior and posterior policy. Between prior and posterior lies an anytime deliberation process that can be instantiated by sample-based evaluations of the utility function through Markov Chain Monte Carlo (MCMC) optimization. The most simple model assumes a fixed prior and can relate abstract informationtheoretic processing costs to the number of sample evaluations. However, more advanced models would also address the question of learning, that is how the prior is adapted over time such that generated prior proposals become more efficient. In this work we investigate generative neural networks as priors that are optimized concurrently with anytime sample-based decision-making processes such as MCMC. We evaluate this approach on toy examples.
One weakness of machine learning algorithms is the poor ability of models to solve new problems w... more One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We discuss this principle from a Bayesian perspective and show its connections to previous approaches to CL. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method in continual supervised learning and in continual reinforcement learning.
The well-known notion of dimension for partial orders by Dushnik and Miller allows to quantify th... more The well-known notion of dimension for partial orders by Dushnik and Miller allows to quantify the degree of incomparability and, thus, is regarded as a measure of complexity for partial orders. However, despite its usefulness, its definition is somewhat disconnected from the geometrical idea of dimension, where, essentially, the number of dimensions indicates how many real lines are required to represent the underlying partially ordered set. Here, we introduce a variation of the Dushnik-Miller notion of dimension that is closer to geometry, the Debreu dimension, and show the following main results: (i) how to construct its building blocks under some countability restrictions, (ii) its relation to other notions of dimension in the literature, and (iii), as an application of the above, we improve on the classification of preordered spaces through real-valued monotones.
Although there is a somewhat standard formalization of computability on countable sets given by T... more Although there is a somewhat standard formalization of computability on countable sets given by Turing machines, the same cannot be said about uncountable sets. Among the approaches to define computability in these sets, order-theoretic structures have proven to be useful. Here, we discuss the mathematical structure needed to define computability using order-theoretic concepts. In particular, we introduce a more general framework and discuss its limitations compared to the previous one in domain theory. We expose four features in which the stronger requirements in the domain-theoretic structure allow to improve upon the more general framework: computable elements, computable functions, model dependence of computability and complexity theory. Crucially, we show computability of elements in uncountable spaces can be defined in this new setup, and argue why this is not the case for computable functions. Moreover, we show the stronger setup diminishes the dependence of computability on the chosen order-theoretic structure and that, although a suitable complexity theory can be defined in the stronger framework and the more general one posesses a notion of computable elements, there appears to be no proper notion of element complexity in the latter.
Majorization is a fundamental model of uncertainty that possesses several applications in areas l... more Majorization is a fundamental model of uncertainty that possesses several applications in areas like thermodynamics or entanglement theory, and constitutes one of the pillars of the modern resource-theoretic approach to physics. Here, we improve on its relation to measurement apparatus. In particular, after discussing what the proper notion of second law in this scenario is, we show that, for a sufficiently large state space, any family of entropy-like functions constituting a second law is necessarily countably infinite. Moreover, we provide an analogous result for a variation of majorization known as thermo-majorization which, in fact, does not require any constraint on the state space provided the equilibrium distribution is not uniform. Lastly, we conclude by discussing the applicability of our results to molecular diffusion.
The goal of meta-learning is to train a model on a variety of learning tasks, such that it can ad... more The goal of meta-learning is to train a model on a variety of learning tasks, such that it can adapt to new problems within only a few iterations. Here we propose a principled informationtheoretic model that optimally partitions the underlying problem space such that specialized expert decision-makers solve the resulting subproblems. To drive this specialization we impose the same kind of information processing constraints both on the partitioning and the expert decision-makers. We argue that this specialization leads to efficient adaptation to new tasks. To demonstrate the generality of our approach we evaluate three meta-learning domains: image classification, regression, and reinforcement learning.
The Nash equilibrium concept has previously been shown to be an important tool to understand huma... more The Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models to human behavior engaged in sensorimotor interactions with haptic feedback based on three classic games, including the prisoner's dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the gametheoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics. Nash equilibria are the central solution concept for understanding strategic interactions between different agents 1 . Crucially, unlike other maximum expected utility decision-making models 2-4 , the Nash equilibrium concept cannot assume a static environment that can be exploited to find the optimal action in a single sweep, but it rather defines a fixed point representing a combination of strategies that can be found by iteration, so that finally no agent has anything to gain by deviating from their equilibrium behavior. Here, a strategy is conceived as a probability distribution over actions, so that Nash equilibria are in general determined by combinations of probability distributions over actions (mixed Nash equilibria), and only in special cases by combinations of single actions (pure equilibria) . An example of the first kind is the popular rock-papers-scissors game which can be simplified to the matching pennies game with two action choices, where in either case the mixed Nash equilibrium requires players to randomize their choices uniformly. An example of the second kind is the prisoner's dilemma 6 where both players choose between cooperating and defecting without communication. The pure Nash equilibrium in this game requires both players to defect, because the payoffs are designed in a way that allow for a dominant strategy from the perspective of a single player, where it is always better to defect, no matter what the other player is doing. The Nash equilibrium concept has not only been broadly applied in economic modeling of interacting rational agents, companies and markets , but also to explain the dynamics of animal conflict , population dynamics including microbial growth , foraging behavior , the emergence of theory of mind , and even monkeys playing rock-papers-scissors . Recently, Nash equilibria have also been proposed as a concept to understand human sensorimotor interactions . In these studies human dyads are typically coupled haptically 30 and experience physical forces that can be regarded as payoffs in a sensorimotor game. By designing the force payoffs appropriately in dependence of subjects' actions, these sensorimotor games can be made to correspond to classic pen-and-paper games like the prisoner's dilemma with a single pure equilibrium, coordination games with
We define common thermodynamic concepts purely within the framework of general Markov chains and ... more We define common thermodynamic concepts purely within the framework of general Markov chains and derive Jarzynski's equality and Crooks' fluctuation theorem in this setup. In particular, we regard the discrete-time case, which leads to an asymmetry in the definition of work that appears in the usual formulation of Crooks' fluctuation theorem. We show how this asymmetry can be avoided with an additional condition regarding the energy protocol. The general formulation in terms of Markov chains allows transferring the results to other application areas outside of physics. Here, we discuss how this framework can be applied in the context of decision-making. This involves the definition of the relevant quantities, the assumptions that need to be made for the different fluctuation theorems to hold, as well as the consideration of discrete trajectories instead of the continuous trajectories, which are relevant in physics.
Information-theoretic principles for learning and acting have been proposed to solve particular c... more Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.
We introduce a new class of real-valued monotones in preordered spaces, injective monotones. We s... more We introduce a new class of real-valued monotones in preordered spaces, injective monotones. We show that the class of preorders for which they exist lies in between the class of preorders with strict monotones and preorders with countable multi-utilities, improving upon the known classification of preordered spaces through real-valued monotones. We extend several well-known results for strict monotones (Richter-Peleg functions) to injective monotones, we provide a construction of injective monotones from countable multi-utilities, and relate injective monotones to classic results concerning Debreu denseness and order separability. Along the way, we connect our results to Shannon entropy and the uncertainty preorder, obtaining new insights into how they are related. In particular, we show how injective montones can be used to generalize some appealing properties of Jaynes' maximum entropy principle, which is considered a basis for statistical inference and serves as a justification for many regularization techniques that appear throughout machine learning and decision theory. 1 Here, x ≺ y means x y and ¬(y x)
The application of thermodynamic reasoning in the study of learning systems has a long tradition.... more The application of thermodynamic reasoning in the study of learning systems has a long tradition. Recently, new tools relating perfect thermodynamic adaptation to the adaptation process have been developed. These results, known as fluctuation theorems, have been tested experimentally in several physical scenarios and, moreover, they have been shown to be valid under broad mathematical conditions. Hence, although not experimentally challenged yet, they are presumed to apply to learning systems as well. Here we address this challenge by testing the applicability of fluctuation theorems in learning systems, more specifically, in human sensorimotor learning. In particular, we relate adaptive movement trajectories in a changing visuomotor rotation task to fully adapted steady-state behavior of individual participants. We find that human adaptive behavior in our task is generally consistent with fluctuation theorem predictions and discuss the merits and limitations of the approach. The study of learning systems with concepts borrowed from statistical mechanics and thermodynamics has a long history reaching back to Maxwell's demon and the ensuing debate on the relation between physics and information . Over the last 20 years, the informational view of thermodynamics has experienced great developments, which has allowed to broaden its scope form equilibrium to non-equilibrium phenomena . Of particular importance are the so-called fluctuation theorems , which relate equilibrium quantities to nonequilibrium trajectories allowing, thus, to approximate equilibrium quantities via experimental realizations of non-equilibrium processes . Among the fluctuation theorems, two results stand out, Jarzynski's equality and Crooks' fluctuation theorem , as they aim to bridge the apparent chasm between reversible microscopic laws and irreversible macroscopic phenomena . The advances in non-equilibrium thermodynamics have recently also led to new theoretical insights into simple learning systems . Abstractly, thermodynamic quantities like energy, entropy or free energy can be thought to define order relations between states , which makes them applicable to a wide range of problems. In the economic sciences, for example, such order relations are typically used to define a decisionmaker's preferences over states 30 . Accordingly, a decision-maker or a learning system can be thought to maximize a utility function, analogous to a physical system that aims to minimize an energy function. Moreover, in the presence of uncertainty in stochastic choice, such decision-makers can be thought to operate under entropy constraints reflecting the decision-maker's precision , resulting in soft-maximizing the corresponding utility function instead of perfectly maximizing it. This is formally equivalent to following a Boltzmann distribution with energy given by the utility. Therefore, in this picture, the physical concept of work corresponds to utility changes caused by the environment, whereas the physical concept of heat corresponds to utility gains due to internal adaptation 46 . Like a thermodynamic system is driven by work, such learning systems are driven by changes in the utility landscape (e.g. changes in an error signal). By exposing learning systems to varying environmental conditions, it has been hypothesized that adaptive behavior can be studied in terms of fluctuation theorems , which are not necessarily tied to physical processes but are broadly applicable to stochastic processes satisfying certain constraints . Fluctuation theorems are usually deployed in statistical mechanics; particularly, the study of nonequilibrium steady states in thermodynamics. In this setting, one normally assumes a probabilistic description of an ensemble of many particles, i.e., the kinds of systems usually considered in statistical thermodynamics. However, as described in , exactly the same principles and fluctuation theorems also apply to the path of a single particle, leading to stochastic thermodynamics. This suggests that fluctuation theorems may not only be applicable to the statistics of ensembles of many learners, but also when describing the trajectory of a single participant during a learning process.
The study of complexity and optimization in decision theory involves both partial and complete ch... more The study of complexity and optimization in decision theory involves both partial and complete characterizations of preferences over decision spaces in terms of real-valued monotones. With this motivation, and following the recent introduction of new classes of monotones, like injective monotones or strict monotone multi-utilities, we present the classification of preordered spaces in terms of both the existence and cardinality of real-valued monotones and the cardinality of the quotient space. In particular, we take advantage of a characterization of real-valued monotones in terms of separating families of increasing sets in order to obtain a more complete classification consisting of classes that are strictly different from each other. As a result, we gain new insight into both complexity and optimization, and clarify their interplay in preordered spaces.
Computability on uncountable sets has no standard formalization, unlike that on countable sets, w... more Computability on uncountable sets has no standard formalization, unlike that on countable sets, which is given by Turing machines. Some of the approaches to define computability in these sets rely on order-theoretic structures to translate such notions from Turing machines to uncountable spaces. Since these machines are used as a baseline for computability in these approaches, countability restrictions on the ordered structures are fundamental. Here, we aim to combine the theories of computability with order theory in order to study how the usual countability restrictions in these approaches are related to order density properties and functional characterizations of the order structure in terms of multi-utilities.
Bayes optimal and heuristic decision-making schemes are often considered fundamentally opposed to... more Bayes optimal and heuristic decision-making schemes are often considered fundamentally opposed to each other as a framework for studying human choice behavior, although recently it has been proposed that bounded rationality may provide a natural bridge between the two when varying information-processing resources. Here, we investigate a two-alternative forced choice task with varying time constraints, where subjects have to assign multi-component symbolic patterns to one of two stimulus classes. As expected, we find that subjects' response behavior becomes more imprecise with more time pressure. However, we also see that their response behavior changes qualitatively. By regressing subjects' decision weights, we find that decisions allowing for plenty of decision time rely on weighing multiple stimulus features, whereas decisions under high time pressure are made mostly based on a single feature. While the first response pattern is in line with a Bayes-optimal decision strate...
Uploads
Papers by Daniel Braun