[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] / [Samples] / ProbabilityCA
[Index] [Contents] [Source Text] [About] [Notation] [Copyright] [Comment/Contact] [Search ]
Sun Dec 4 21:09:32 PST 2011

Contents


    On the Application of Elementary Probability Theory to Mean Field Predictions of Totalistic Cellular Automata

      Source

      This set of notes is based some email from July 2010 sent to Keith Schubert, Jane Curnutt, Ernesto Gomez, Matt Strader, and Blake Schulte. It incorporates some of their comments and corrections From Jane Curnutt and Ernesto Gomez.

      Status

      First draft, April 26th 2011. Corrected errors in definition of P and q below, December 4th, 2011.

      Introduction

      This set of notes describe some well known tools for probabilistic analysis as they apply to Cellular Automata. I will show how a random independent distribution of live and dead cells evolves in one step. These are all well known applications of probability theory -- the theories can be found on Wikipedia. I'l first work out the case when each cell has a different probability of being alive and then assume equal probabilities -- the so called "Mean Field Model". These notes move on to the use of De Moivre's Theorem to simplify the mean field calculations for large neighborhoods. They finish by describing why these formulas can not predict the long term evolution of cellular systems.

      Basis

      A cellular automata ( CA ) is an array of cells. Each cell has a state. Each cell has the same rule (program) that calculates the new state of the cell from it's own state and the states of the cells in a given neighborhood. In this set of notes the state is either alive or dead and the rule is based on the number of live cells in the neighborhood -- an ignores which neighbors are alive or dead. This is a widely studied case that forms a primitive lifelike system.

      A cell can be alive or dead. Let p:[0..1] be the probability of being alive and q = 1-p the chance of it being dead, then p' the chance of it being alive in the next cycle, is

    1. p' = p * (1-Death) + q * Birth. Where Birth and Death depend on the cells in the immediate neighborhood.

      In a case studied by Jane Curnutt, birth is when there is one live cell 1 and no death can happen:

    2. p' = p + q * P(number of live cells in neighborhood is 1).

      For a moment I'll assume that each cell c has its own probability p(c) of being alive.... independently of other cells.

    3. q(c) = 1-p(c).

      So if a cell c is alive, p(c) =1 and q(c)=0. If c is dead p(c)=0 and q(c)=1. The function

    4. p : Cells -> [0..1], describes a field we can call the density.

      Typically the probability of Birth or Death depends on the states of the cells in the neighborhood N(c) around the cell c. Let n = |N(c)|, the number of cells in the neighborhood N(c). A Totalistic Cellular Automate we are interested in how many cells in N(c) are alive. So we need to calculate the probability of r cells being alive in the neighborhood N(c). Symbolize this as P(r) where r varies from 0 to n and indicates the number of live cells in the neighborhood N(c) of cell c:

    5. P(r)::[0..1]= the probabillity that r cells are alive.

      Probability Generating Functions

      There is an elegant way to calculate all the P(r)'s by using a probability generating function (pgf) g:
    6. g(t)::= P(0)*t^0 + P(1)*t^1+... P(n)*t^n = Σ [r in 0..n](P(r)*T^r).

      The pgf for a single cell c in the example CA is

    7. q(c)+p(c)*t.

      A second example pgf: if c is alive the pgf is t, if dead it is 1, if the cell has a 50-50 chance of being alive, (1+t)/2.

      The pgf of the sum of independent random variables is the product of the pgf's of the variables. So

    8. g(t) = Π [c in 1..n] (q(c) + p(c) t).

      So with 8 cells in a neighborhood, suppose we have 1 alive, 5 dead and 2 cells with a 50-50 chance of being alive then

    9. g(t) = t * 1^5 * ((1+t)/2)^2
    10. = t/4 + t^2/2 + t^3/4.

      So the distribution is ( 0, 1/4, 1/2, 1/4, 0, 0,...). In particular, the chance of birth in Jane's case is 1/4.

      Now let us assume that we create the cells with an equal and independent chance p of being alive (and let q=1-p). Then the pgf is

    11. g(t) = Π [c in 1..n] (q + p t)
    12. = (q + p t)^n, the binomial distribution.

      Now consider the normal neighborhood of 8 cells and we get this (using PAscal's triangle):

    13. g(t) = q^8 * t^0 + 8 * p*q^7 * t^1 + 28 *p^2*q^6 *t^2 + 56*p^3*q^5 *t^3 + 70*p^4*q^4 *t^4+ 56*p^5*q^3 *t^5 + ... +p^8*t^8.

      So for birth at one we have 8 * p*q^7. Substituting in the second equation above

    14. p ' = p + q *(8 * p*q^7) = p(1+8*q^8).

      The curve rises rapidly to a peak of roughly 0.48 at density 0.15 (roughly) then enters a trough, then tends to p'=p. At all times p'>=p.

      You can work out the curve for Conway's life using the same g(t) and and the first equation. If I've done the algebra right

    15. p' = p^3q^5( 84q + 56p ) -- (Conway)

      Binomial Distribution

      So when the Birth and Death depends on the number of live cells in the neighborhood, and each cell in the neighborhood has the same probability p of being alive (a mean field model), and n= number of cells in a neighborhood, and q=1-p then the Probability of r live cells in the neighborhood is
    16. P(r) = C(n,r) p^r q^(n-r),
    17. C(n,r)= n! / (r! (n-r)!). the famous Binomial Distribution.

      This lets you graph the probability, p', that a cell is alive in the next step... Which in turn tells you the best density for a randomized start.

      This formula is easy to use on small neighborhoods like Conway's 8 cell neighborhood. Similarly the 2 cells next to a cell on a line, or the 4 cells that share a boundary with a cell. It is not good for large n.... like Ernesto Gomez's n=48 cell square.

      De Moivre's Theorem

      However, De Moivre, many years ago discovered that there is a very good approximation to the Binomial distribution using the Gaussian Normal curve. You can read about it on Wikipedia (nice animation).

      I usually use two functions commonly called φ and Φ. The φ is the normalized probability density function (pdf)

    18. φ(x) = exp(-x^2/2) / √(2*π).

      Φ(x) is the integral of φ from -oo..x, the distribution function (df) of the normalized Gaussian curve. This is commonly expressed, in most languages in terms of the function erf (the error function) in a math library.

    19. Φ(x) = (1 + erf(x/√ 2) )/2.

      Φ(x) gives a close approximation to P(r) when

    20. x = (r- n p)/√(n p q).

      Note: the mean of the binomial distribution is n p (m) and the standard deviation is √(n p q) (s).

      The Φ function is used for ranges of of rs. You can improve the approximation by adding 0.5 as a continuity correction to the argument for Φ.

      For example, take Ernesto's rule (n=48), birth at b, death above d then ( x is the current density and x' the next density)...

    21. s:= √(48 x (1-x)),
    22. m:= 48 x.
    23. x' = (1-x) φ(b-m)/s + x Φ ( (d-m)/s)+0.5 ).

      Here is a more complex example with a range of birth birth values from b-15 to b and death after that....

    24. x' = (1-x) ( Φ ( (b-m)/s)+0.5 ) - Φ ( (b-15-m)/s)+0.5 ) ) + x Φ ( (b-m)/s)+0.5 ).

      Failure of the Mean Field Model

      These models do a good job of predicting what happens if the live cells are distributed at random, with a constant density, and there is no dependence between them However Dr. Gomez's experiments demonstrate that after one step the adjacent sells are usually highly correlated. The plan looks lumpy with areas of high density and areas of low density. In many cases there is no return to randomness once this chaos has emerged.

      References

      TBA


      (JC): Thesis CSUSB CSE 2010
      (EG): Provate Communication
      (DeMoivre): Wikipedia
      (Gosper): Paper or Thesis

End