The dynamic behavior of mutation and crossover is investigated with the Breeder Genetic Algorithm... more The dynamic behavior of mutation and crossover is investigated with the Breeder Genetic Algorithm. The main emphasis is on binary functions. The genetic operators are compared near their optimal performance. It is shown that mutation is most e cient in small populations. Crossover critically depends on the size of the population. Mutation is the more robust search operator. But the BGA combines the two operators in such a w ay that the performance is better than that of a single operator. For the DECEPTION function it is shown that increasing the size of the population above a certain number decreases the quality of the solutions obtained.
In this paper we study random genetic drift in a nite genetic population. Exact formulae for calc... more In this paper we study random genetic drift in a nite genetic population. Exact formulae for calculating the mean convergence time of the population are analytically derived and some results of numerical calculations are given. The calculations are compared to the results obtained in population genetics. A new proposition is derived for binary alleles and uniform crossover. Here the mean convergence time is almost proportional to the size of the population and to the logarithm of the number of the loci. The results of Monte Carlo type numerical simulations are in agreement with the results from the calculation.
First we show that all genetic algorithms can be approximated by an algorithm which keeps the pop... more First we show that all genetic algorithms can be approximated by an algorithm which keeps the population in linkage equilibrium. Here the genetic population is given as a product of univariate marginal distributions . We describe a simple algorithm which keeps the population in linkage equilibrium. It is called the univariate marginal distribution algorithm (UMDA). Dur main result is that UMDA transforms the discrete optimization problem into a continuous one defined by the average fitness W(p I , .. . , Pn) as a function of the univariate marginal distributions Pi . For proportionate selection UMDA performs gradient ascent in the landscape defined by W(p) . We derive a difference equation for Pi which has already been proposed by Wright in population genetics . We show that UMDA solves difficult multimodal optimization problem s. For functions with highly correlated variables UMDA has to be extended . The factorized distribution algorithm (FDA) uses a factorization into marginal and conditional distributions . For decomposable functions the optimal factorization can be explicitly computed. In general it has to be computed from the data. This is done by LFDA. It uses a Bayesian network to reprcsent the distribution. Corn- puting the network structure from the data is called learning in Bayesian network theory. The problem of finding a minimal structure which explains the data is discussed in detail. It is shown that the Bayesian information criterion is a good score for this problem .
Proceedings of IEEE International Conference on Evolutionary Computation
Abstract|W e present a competiton scheme which dynamically allocates the number of trials given t... more Abstract|W e present a competiton scheme which dynamically allocates the number of trials given to di erent search strategies. The competition scheme changes the sizes of the subgroups, but also the size of the whole population. The competition scheme is able to combine the strengths of individual search strategies in a synergetic way. This claim is demonstrated by n umerical experiments with two di cult functions to be optimized.
Parallel genetic algorithms (PGA) use two major modi cations compared to the genetic algorithm. F... more Parallel genetic algorithms (PGA) use two major modi cations compared to the genetic algorithm. Firstly, selection for mating is distributed. Individuals live i n a 2 -D w orld. Selection of a mate is done by e a c h individual independently in its neighborhood. Secondly, each individual may improve its tness during its lifetime by e.g. local hill-climbing. The PGA is totally asynchronous, running with maximal e ciency on MIMD parallel computers. The search strategy of the PGA is based on a small number of intelligent and active individuals, whereas a GA uses a large population of passive individuals. We will show t h e p o wer of the PGA with two c o m binatorial problems -the traveling salesman problem and the m graph partitioning problem. In these examples, the PGA has found solutions of very large problems, which are comparable or even better than any other solution found by other heuristics. A comparison between the PGA search strategy and iterated local hill-climbing is made.
The Breeder Genetic Algorithm (BGA) is based on the equation for the response to selection. In or... more The Breeder Genetic Algorithm (BGA) is based on the equation for the response to selection. In order to use this equation for prediction, the variance of the tness of the population has to be estimated. For the usual sexual recombination the computation can be di cult. In this paper we shortly state the problem and investigate several modi cations of sexual recombination. The rst method is gene pool recombination, which leads to marginal distribution algorithms. In the last part of the paper we discuss more sophisticated methods, based on estimating the distribution of promising points.
Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence
Genetic programming has been successfully applied to evolve computer programs for solving a varie... more Genetic programming has been successfully applied to evolve computer programs for solving a variety o f interesting problems. In the previous work we i n troduced the breeder genetic programming (BGP) method that has Occam's razor in its tness measure to evolve minimal size multilayer perceptrons. In this paper we apply the method to synthesis of sigma-pi neural networks. Unlike p e r c e ptron architectures, sigma-pi networks use product units as well as summation units to build higher-order terms. The e ectiveness of the method is demonstrated on benchmark problems. Simulation results on noisy data suggest that BGP not only improves the generalization performance, it can also accelerate the convergence speed.
The breeder genetic algorithm BGA depends on a set of control parameters and genetic operators. I... more The breeder genetic algorithm BGA depends on a set of control parameters and genetic operators. In this paper it is shown that strategy adaptation by competing subpopulations makes the BGA more robust and more e cient. Each subpopulation uses a di erent strategy which competes with other subpopulations. Numerical results are presented for a number of test functions.
The success of evolutionary algorithms, in particular Factorized Distribution Algorithms (FDA), f... more The success of evolutionary algorithms, in particular Factorized Distribution Algorithms (FDA), for many pattern recognition tasks heavily depends on our ability to reduce the number of function evaluations. This paper introduces a method to reduce the population size overhead. We use low order marginals during the learning step and then compute the maximum entropy joint distributions for the cliques of the graph. The maximum entropy distribution is computed by an Iterative Proportional Fitting embedded in a junction tree message passing scheme to ensure consistency. We show for the class of single connected FDA that our method outperforms the commonly-used PLS sampling.
Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorit... more Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper the major design issues of EDA's are discussed using an interdisciplinary framework, the minimum relative entropy (MinRel) approximation. We assume that the function to be optimized is additively decomposed (ADF). The interaction graph GADF of the ADF is used to create exact or approximate factorizations of the Boltzmann distribution. The relation between the Factorized Distribution Algorithm FDA and the MinRel approximation is shown. We present a new algorithm, derived from the Bethe-Kikuchi approach developed in statistical physics. It minimizes the relative entropy KLD(q|p β ) to the Boltzmann distribution p β by solving a difficult constrained optimization problem. We present in detail the concave-convex minimization algorithm CCCP to solve the optimization problem. The two algorithms are compared using popular benchmark problems (2-d grid problems, 2-d Ising spin glasses, Kaufman's nk function.) We use instances up to 900 variables.
An efficient and systematic LL(1) error recovery method is presented that has been implemented fo... more An efficient and systematic LL(1) error recovery method is presented that has been implemented for an LL(1) parser generator. Error messages which provide good diagnostic information are generated automatically. Error correction is done by discarding some input symbols and popping up some symbols from the parsing‐stack in order to restore the parser to a valid configuration. Thus, symbol deletions and insertions are simulated. The choice between different possible corrections is made by comparing the cost of the inserted (popped) symbols with the reliability value of the recovery symbol (the first input symbol that is not discarded). Our concept of reliability is based on the observation that input symbols differ from each other in their ability to serve as recovery points. A high reliability value of a symbol asserts that it was probably not placed in the input by accident. So it is reasonable not to discard that symbol but to resume parsing. This is done even if a string with high...
International Journal of Approximate Reasoning, 2002
We present a theory of population based optimization methods using approximations of search distr... more We present a theory of population based optimization methods using approximations of search distributions. We prove convergence of the search distribution to the global optima for the factorized distribution algorithm (FDA) if the search distribution is a Boltzmann distribution and the size of the population is large enough. Convergence is defined in a strong sense--the global optima are attractors of a dynamical system describing mathematically the algorithm. We investigate an adaptive annealing schedule and show its similarity to truncation selection. The inverse temperature b is changed inversely proportionally to the standard deviation of the population. We extend FDA by using a Bayesian hyper-parameter. The hyper-parameter is related to mutation in evolutionary algorithms. We derive an upper bound on the hyper-parameter to ensure that FDA still generates the optima with high probability. We discuss the relation of the FDA approach to methods used in statistical physics to approximate a Boltzmann distribution and to belief propagation in probabilistic reasoning. In the last part are sparsely connected. Our empirical results are as good or even better than any other method used for this problem.
Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorit... more Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper we explain the relationship of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics. The major design issues are discussed within a general interdisciplinary framework. It is shown that maximum entropy approximations play a crucial role. All proposed algorithms try to minimize the Kullback-Leibler divergence KLD between the unknown distribution p(x) and a class q(x) of approximations. However, the Kullback-Leibler divergence is not symmetric. Approximations which suppose that the function to be optimized is additively decomposed (ADF) minimize KLD(q||p), the methods which learn the approximate model from data minimize KLD(p||q). This minimization is identical to maximizing the log-likelihood. In the paper three classes of algorithms are discussed. FDAuses the ADF to compute an approximate factorization of the unknown distribution....
Uploads
Papers by H. Mühlenbein