Theory of estimation-of-distribution algorithms

Carsten Witt is a professor at the Technical University of Denmark. He received his M. Sc. in 2000 and Ph. D. degree in 2004, both in Computer Science from the Technical University of Dortmund, Germany. His main expertise is in the algorithmic analysis of metaheuristics, including evolutionary algorithms, ant colony optimization and estimation-of-distribution algorithms. Carsten has over 90 peer-reviewed publications and has given tutorials about bio-inspired computation in combinatorial optimization at previous GECCO and PPSN conferences. He is member of the editorial boards of Evolutionary Computation Journal and Theoretical Computer Science.


Some Benchmark Functions
Theoretical results often consider simple problems, which we have to understand first.
Illustrate simple but fundamental properties; re-appear in more complex scenarios.
Theory of EDAs › Preliminaries 12 of 63

Univariate Algorithms
Common concept: the frequency vector (aka.marginal probabilities) Probabilities (p 1 , . . ., p n ) for setting the individual bits to 1. Usually initialized as p i = 1/2 for all i.Independently sampled.
Frequency vector adjusted over time.
Compact GA (cGA) (Harik et al., 1999) Simulates behavior of a GA with population size K in a compact way.
Parameter K determines preciseness of model.Big K = fine model = small update strength.

Genetic Drift
If the fitness function is constant/flat (does not give a signal), frequencies move randomly to a border (DEMO).
Frequencies that move to the wrong border are problematic -even disastrous if the border is not there.Even if the expected value of a frequency converges to optimal value (Höhfeld and Rudolph, 1997) Early Results (< 2000) Models of cGA, UMDA and others, allowing estimations of the dynamical behavior (e. g., Thierens et al., 1998;Mühlenbein and Mahnig, 1999).Two effects: 1. Overall progress of probabilistic model (roughly: p i ) and time to convergence to good distribution ( p i ≈ n), 2. time for single frequencies to drift to wrong border by genetic drift → should be bigger than convergence time.
Avoiding genetic drift requires precise enough model → lower bound on runtime.Models of EDAs estimated progress of frequencies: Mühlenbein and Mahnig (1999) which was made rigorous recently.

Upper bound on time to convergence
Time for the EDA to converge to optimal solution ≈ K √ n.

Demo and Landscape
Expected runtime of cGA depending on n and Proof Idea in the "Large K " Regime Show limits on genetic drift: with high prob.all frequencies stay above 1/3 in a phase of Θ(K √ n) steps.
To analyze speed: consider Φ t = n i=1 p t,i and analyze its drift.Important: how does a single frequency evolve?Consider the two offspring x and y and look into bit i.

of 63
Dynamics on Bit i Red area: bit i is irrelevant in this step ⇒ genetic drift moves p t,i in a random direction ±1/K (rw-step) Blue area: bit i decides the outcome of f (x) vs. f (y) ⇒ increase p t,i (b-step, learning that 1s are better than 0s) What Happens for Small K Now look into K < √ n log n: Lower bound Ω(n) until 2015.Improved to Ω(n log n) (Sudholt and W., 2016).Heavy genetic drift occurs.

DEMO.
Whether optimum can be found at all, depends very much on the borders on the frequencies.
If no borders, with high probability frequencies locked to 0 ⇒ infinite runtime.
⇒ Borders may make the algorithm efficient despite genetic drift.

Theory of EDAs › OneMax
Idea for the Lower Bound

Coupon collector
You have to collect n different coupons.In each round, you are given one coupon chosen uniformly at random with replacement.In expectation, it takes Ω(n log n) rounds to collect all of them.
CC-BY-SA-4.0 by Jarek Tuszyński, 2015 Here the coupons are the frequencies at the lower border.Each of them has probability 1/n of being raised.
If many frequencies move to the lower border before optimum is found, we cannot be faster than n log n.

of 63
Medium and Large K : Phase transition between smooth behavior and strong genetic drift at K ∼ √ n log n If no borders, algorithm fails at K = o( √ n log n) With borders, efficient behavior as long as K = ω(log n).Upper bound O(Kn) only conjectured here.(Neumann et al., 2010), similar to cGA: landslides of frequencies occur.

DEMO.
Frequency that have attained their maximum 1 − 1/n are nevertheless likely to drop down to minimum.Very unstable behavior, exponential optimization time.
Runtime of cGA on OneMax: Complete Picture Analysis of UMDA How does UMDA perform on OneMax?Surprisingly, in terms of runtimes (no. of iterations • λ), not very differently from cGA.
Obstacles in analysis: frequencies can change drastically, even from minimum to maximum in one generation, two parameters: µ and λ.
The first runtime analysis of UMDA on OneMax stems from 2015! Theorem (Expected runtime of UMDA on OneMax) Theory of EDAs › OneMax

of 63
Results for UMDA: Lower Bounds We also obtain similar bounds as with cGA: Theorem (Krejca and W., 2017) Proof idea again: at a very high level, we estimate 1. progress (drift) of sum of frequencies per iteration 2. time for genetic drift to move frequencies to wrong border both for upper and lower bounds.Especially 2. is challenging.What Happens in the Medium Regime

Overview of Runtime Bound for UMDA on OneMax
Where is the truth?Experiments: In fact a multimodal behavior: is O(λn) also Ω(λn) in medium regime?

of 63
New Result: Runtime of cGA is Multimodal The multimodal behavior has been made rigorous for the simpler cGA, not yet for UMDA.
Theorem (Lengler et al., 2018) Expected runtime of cGA on OneMax is Ω( ⇒ Setting K = Θ(log n) gives us runtime Θ(n log n), so does K = Θ( √ n log n), but values in between make runtime worse.

Balancedness, Stability and Genetic Drift
Have seen: without fitness signal, stochastic model of EDAs is expected to be the same.Term: balanced (Friedrich et al., 2016a).However, this is only the expected value.Genetic drift plays major role in classical EDAs.Idea (Doerr and Krejca, 2018): Move frequency away from its initial value only when there is evidence that 0 or 1 is the better bit value.

λ
Left: average runtime; right: no.times frequency hits minimum.
Possible reason: behavior is more obvious than on OneMax and not very different from classical EAs.Typical: frequencies are optimized from left to right.* * * * * * * * * * * * * * If best-so-far solution has i leading ones, then last n − i − 1 frequencies are drifting randomly.has not been considered much in the theory of EDAs (except for UMDA).Possible reason: behavior is more obvious than on OneMax and not very different from classical EAs.Typical: frequencies are optimized from left to right.* * * * * * * * * * * * * * If best-so-far solution has i leading ones, then last n − i − 1 frequencies are drifting randomly.has not been considered much in the theory of EDAs (except for UMDA).Possible reason: behavior is more obvious than on OneMax and not very different from classical EAs.Typical: frequencies are optimized from left to right.* * * * * * * * * * * * * * If best-so-far solution has i leading ones, then last n − i − 1 frequencies are drifting randomly.
is stable if a frequency in absence of fitness signal stays close to its initial value.cGA, UMDA, … are not stable.Frequencies quickly converge to either maximum or minimum (each with probability 1/2) due to genetic drift.Theory of EDAs › Stable EDAs 60 of 63A New Way to Overcome Genetic Drift: Significance-Based EDAs

Framework
like cGA.For each bit, history H i ∈ {0, 1} * of values in better individual Investigate last m history bits.If a value significantly dominates, move frequency to corresponding border (1/n if 0 dominates, 1 − 1/n if 1).Otherwise, leave frequency at 1/2.Example of significance: | H i 1 − m 2 | ≥ C √ m ln n Different values for m are tested by the algorithm.
below log n strongly conjectured.