. Some Stochastic Models of Software Evolution Dr. Richard J. Botting Computer Science CSUSB San Bernardino, CA 92407, USA http://www.csci.csusb.edu/dick . Outline of Paper a. Previous stochastic models b. General Form c. Special Cases d. Conclusions . CMM Level 1 Processes CMM levels: 1:Chaotic, 2:Repeatable, 3:Defined, 4:Managed, 5:Optimizing Level 1 processes are random. We can use stochastic processes to model them. One stochastic process fits the "Hacker Ethic". It is a well known stochastic process. We can predict the consequences. They are not good. Other processes ameliorate the problems. . Earlier Stochastic Models of Software Development (Reliability Theory) Poisson processes: Like ants in my kitchen, bugs pop up and disappear at random. No model of software structure. No model of diagnosing the defect from the bug. . Recent Stochastic Model Jayant Rajgopal & Mainak Mazumbar Given modules, transition probabilities between modules, and the probabilities of a module failing they calculate reliability (prob. of failure). They model internal behavior of software. I will be modeling software evolution. . Debugging as a Process Definition of Debugging The programmer works alone. No documentation. Many cycles of: Run a test on the whole program. Change one piece of the code. Different stopping rules. . An Immature Modular Software Process * Software has many parts. Some parts are defective, some are good. It is not easy to tell good from defective. * Programmers are human and so error prone. Immaturity is modeled by randomness * A string of lights will light only if you can find and replace the broken one. It may be easier to replace the whole string! . Parameters * Software n is the number of pieces or parts. R is the number of defective pieces out of n a.R is a random variable with values 0,1,2,..., n. a.When R=0, the software will pass its test. D = R/n, the defect density. * Process a : [0..1] is the probability that a piece of code is defect free after a change. b= 1-a , is the probability that a piece of code has a defect after a change. t: 0,1,2,..., is the number of change+test cycles. . Markov Chain p[r](t) = Pr[ R=r after t cycles of debugging ] * p(t): row vector of n+1 probabilities p(t) = ( p0(t), p1(t), p2(t), ..., pn(t) ) * p(t+1) = p(t) P where P is a particular (n+1)><(n+1) matrix * p(t) = p(0) P^t . The Matrix P (Hacking) . The Limit t Text book result: tends to the Binomial Distribution . Consequences * Number defective parts R Mean = nb , variance = nab * Defect Density D Mean = nb/n = b . Standard dev'n= aqrt(ab/n) -> 0 ( as n->oo) * Defect density comes to depend only on fixing skill. . The Perfect Hacker A Special Case: b = 0 * If there are R defective pieces initially, * it takes n H[R] cycles on average to fix all R parts. H[R] is the Harmonic number 1/1+1/2+1/3+....1/R * Cycles be fore fixing = O( n log(n) ) . Debug & Release Fixed point R = 0 . Evolving Software & Requirements Creep When R=0, then R'=1 . Time to Re-Release MTRmean time to rerelease n = 10 * b=0.0, MTR= 10 * b=0.01, MTR= 11 * b=0.1, MTR= 19 * * * b=0.7, MTR= 241,928 . Five Other Processes * Classic Software Engineering * Clean Room * Extreme Programming * Open Source * Biological Evolution . Classic Software Engineering * All Requirements known at the start. * Requirements mapped into design. * Design mapped into code * Documentation! n 1, Birth-Death models * Co$t of documentation . Clean Room No debugging * Inspections remove defects before coding. * Tests estimate mean time to failure in use. * Testing debugs the process, not the code. * The Process evolves slowly. * The Code has few if any bugs. . Extreme Programming (XP) Not Level 1 ! * XP reduces n by using unit tests and merciless refactoring. * XP improves by pair programming. * XP is limited to projects where face to face meetings are possible. . Open Source The Bazaar * Parallel debugging. * No bug is deep if enough eyeballs look for it. * Closer to biological evolution. * Users must also be programmers . Biological Evolution * Genes effect many offspring. * All offspring tested in parallel. * Death terminates testing. * Mixes best-of-breed genes. * Performs much better than one blind hacker. * But Genetic Engineering fits the hacker model. . Conclusions * One Size Does Not Fit All. * Keep your n small and your a high. * A million monkeys will beat a blind watch maker!