Select this to skip to main content [CSUSB] >> [CNS] >> [Comp Sci Dept] >> [R J Botting] >> [Monograph] >> 01_3
[Index] || [Contents] || [Source] || [ Search monograph] || [Notation] || [Copyright] Wed Mar 10 16:05:42 PST 2004


    3 Reality Directed Methods

      We start by looking at a class of projects that are well documented and comparatively successful to see what methods they used. This leads to a collection of methodologies that delay choosing the technical structure of the solution until a model of the "real world" has been abstracted. Reality Directed methods are not the way most people like to work but they promise to resolve the problems of the technical approaches in section 2 above.

      3.1 Programming Language Development

        To find out effective design techniques we need to compare published data from similar projects. Languages and tools for programming are a suitable domain. In particular compilers and interpreters:- Algol, COBOL, FORTRAN, Ada, LISP, Edison, Turing and many others. Much of the material in this section comes from well known language reports [Naur64] [McCarthyetal.62] [ANSI83ADA], . . .. Per Brinch Hansen's description of his 'Edison' language is a rich mine of empirical data on a software project [BrinchHansen81] [BrinchHansen82]. Language development shows successful and unsuccessful software engineering in action. Similar triumphs and mistakes happen in other projects. Activities that help or hinder a language design will have a similar effect on other projects.

        Some projects started by writing a compiler or interpreter. The developers experimented, changed their minds, and failed to record what the change was. Semi-functional prototypes got loose. Incompatibilities developed between existing usage and improved versions. Different people observe what the language looks like and write their own versions. There is no record of what the language "really" is. Before long there are many incompatible versions(eg. BASIC, MS BASIC, True BASIC, GW BASIC,...Quick BASIC, Visual BASIC, ...). Apparently FORTRAN I..IV developed this way. C was saved a similar fate because two high quality, low cost, and machine independent versions became available. C++ may or may not avoid death by a thousand versions [Allman90]. Moreover, similar processes are found for non-language software:

          "[...]facilitated workgroups,[...] incremental development, [...] CASE technology, [and] small teams of highly skilled and motivated people"

        It is possible to specify the language first. The software can be produced next. Early projects used a natural language to specify the language. COBOL's specifications had a formal syntax but informal semantics. Since ALGOL 60 a formal meta-language is used to define syntax. Algol 60's "BNF" grammar exposed problems before code was written. The informal parts had problems. Even Pete Naur's clear English was not good enough to stop Algol 60's "own" arrays from having a changeable but fixed length [Naur64]. Over-formal English contributed to the death of ALGOL 68{8}. Commentators asked for formal semantics to clarify imprecise wording in the Ada LRM (page 5, "Ada 9X Requirements Rationale"(May 91) [Ada9X]. Formal semantics would have exposed a problem in POOL(Vaandrager, pp172-236 of [Baetan90]).

        LISP [McCarthyetal.62] had both its syntax and semantics formally defined from the beginning. Pure LISP is highly portable as a result. However LISP's semantics are "operational semantics." Operational semantics map a program into a specified behavior but do not expose misfeatures like LISP's FUNARG problem or dynamic scoping. Later versions of LISP fix these problems, but are not compatible with each other.

        The University of Toronto project that developed Turing showed that software can be produced on time [HoltCordy88]: "Software project schedules are a private joke with me because I've drawn them for many projects but I've never seen one followed. With Turing it is different. Each piece falls into place at its prescribed time." [Holt84] The Turing project did something right but it is hard to sure what. They gave a formal specification of the syntax and defined axiomatic semantics for the language. Yet they also used "the sharpest tools" (yacc[UNIX], and S/SL [HoltCordyWortman82]). They were also a small highly motivated team who claim they didn't have any "management protocols" [Holt84]. Perhaps

      1. success = formal_definition + sharp_tools + motivation - management?

        Here is a similar recipe from outside of programming languages:

      2. "group of people, carefully organized,.. Precise design documents, in a computerized project database,..." [BerztissLuqi91].

      3.2 Syntax Directed Programming

        The Turing project is an example of how software is derived almost automatically from EBNF [Dasgupta91] pp 324-331. Syntax determines compiler and interpreter designs [Examples in Chapter 2, Theory in chapter 3, Bibliography in chapter 9 ( [ DDD in subjects ] )].




        Leveson points out that the development of formal theories of grammars, parser generators etc eliminates the need to invent a new parser for each new compiler [Leveson94] p70. I will later argue that there are similar theories for other problem domains with associated generic cliches/solutions/patterns.

        This process has been modified for other text processing tasks(formatting and translation) - and proved highly effective [AhoUllman7273], ... , [MamrakBarnesO'Connell93].

        From syntax directed programming we can harvest:

        1. the possibility of "sharper" tools.
        2. the important concept that data can be seen in three ways: lexical, syntactical, and semantic.
        3. the usefulness of having independent modules for each view of the data(lexer, parser, generator),
        4. the idea (and the risks) of a product evolving into more usable and powerful forms.

      3.3 Data Directed Design

        Warnier & Orr (LCP), Jackson („JSP), and others use diagrammatic forms of syntax for all problems:




        These methods first define the data in some notation equivalent to EBNF and then use the definition as the structure of the program. I call this class of methods Data Directed Design(DDD) They help the designer to document information that determines a structure of the program that fits the specification closely. They are mandatory in some European countries [references in Chapter 9 under „LCP, „JSP, and „DDD]. These methods stress the difference between the physical data structure and the logical structure but each method handles multiple structures differently - details in chapter 6.

        Data Directed Design--(DDD) has many examples (See chapter 9 for bibliography: „DDD ) but is not recognized as a general method. DDD methods are not generally accepted because

        1. they are not understood,
        2. they require parallel non-deterministic structures,
        3. efficiency comes last(but not least) in DDD, and
        4. technical methods (section 2 above) work well on small problems.

          Parallelism comes from clashes - Jackson, Floyd, Knuth and others separate clashing structures into separate processes.

          Non-determinism enters because DDD maps from regular expressions to program structures. Knuth noted the mapping but dismissed it because a program was thought to be deterministic in those days [Knuth74]. Jackson's backtracking technique derive the structured GOTOs that Knuth discusses from the non-deterministic regular expression of the programs structure [see appendix 2 and Jackson 75]. Later versions of SP are non-deterministic [Dijkstra76].

          Efficiency is another straw man: in practice DDD usually produces code that is fast enough [JalicsHeines83]. Only in rare cases - typically NP-complete problems like the traveling-salesperson problem - does it fail to give a satisfactory solution. Instead it prompts the designer to use a cliche (typically a sort routine, data structure, or data base)[see section 3.4 next].

        However, if a software developer gives DDD methods a chance after some practice an engineer learns
        1. Testing uncovers typos not exciting bugs,
        2. Changes in design are in proportion to the client's perception of the change,
        3. Understanding and modeling the problem is just as important and interesting as coding the solution.
        4. A true DDD method must fail when the data is not pre-defined.

        The next step, therefore, is to look at a processes that design suitable data structures to enable effective processing.

      3.4 Data Engineering

        In 1966 Strachey noted the absence of methods to choose a data structure [Strachey66]. Later "Data Base Design" and "Data Engineering" developed [Bachman69] [Bachman73] [ [Bachman 92 ] [Chen80] [CODASYL71] [Codd70] [Hawryszkiewycz84] [Hawryszkiewycz91] [Wiederhold77] [Martin85] [Navathe92] [ScheerHars]. IBM and James Martin have been promulgating similar Information Models for a decade [Martin85] [Hazzah90], Information Systems Architecture: [Zachman87] [Mercurioetal90] [MathewsMcGee90] [Katz90] [SowaZachman92]. Several steps in SSADM are concerned with logical data structures (LDS) or physical data structures[See sources for „SSADM in Chapter 9]. Recently the same techniques are being used for: Software Tools [LejterMeyersReiss92], AI [Premerlanietal90], Data Structures [CohenCampbell93] [BatorySighalThomasetal93], Object Oriented Design [ShlaerMellor92], Re-engineering legacy code [BrayHess95], Hypertext & Hypermedia Design [Isakowitzetal95] [SchwabeRossi95] [Balasubramanianetal95] and other areas of software engineering.




          "Relational theory and semantic models are useful for analyzing and structuring data. The structures produced are then converted to a logical model based on some data model, which in turn is implemented by commercial data bases software[...] Finally, the implementation may be adjusted, taking into account access requirements and physical structures available to the designer." Introduction to
        [Hawryskiewycz91] The essential parts of data engineering are:
        1. Define the conceptual entities, relationships and attributes (ERA ) [AlaviWetherbe91] for example.
        2. Select prototype and/or existing data to be rationalized and normalized [Codd70].
        3. Assemble a "conceptual model" -- a structured collection of ADTs[by eg. the LDS of „SSADM or chapters 10 thru 13 of Martin 85].
        4. Transform into a "logical design" ["First Cut Design" in „SSADM] by either
        5. Adjust the physical format to perform satisfactorily [ "Physical Design Control" in „SSADM, SPE in [SmithCUWilliamsLG93], cf pp57-58 of [HursonPakzadCheng93].

        The DoD undertook a very large scale variation of this process when they choose to re-engineer 1.4 billion lines of code in dozens of languages and thousands of sites as a single conceptual data model [AikenMuntzRichards94].

        In projects like this DoD project the current systems were the source of information about the environment that is fed into the above process. Re-engineering an existing data base should be easy - if there is a record kept of the data and processes in the above diagram. However, if these are not available then the processes above must be reversed so that the data structure yields a specification and the specification yields a model. The reverse engineering of data is full of surprises and can not be automated [PremerlaniBlaha94] [BrayHess95].

        Research has formulated a hybrid of Data Engineering with the Syntax Directed Compilation of sections 3.1 and 3.2 above [MarkCochrane92]. The result is close to a DDD methodology. Similarly there are now Object-Oriented Data Base Management systems that combine the advantages of the objects and data bases [HursonPakzadCheng93].

        Data Base theory and practice points out a way to improve the attractive Linda model and Gelernter's "Mirror Worlds" [Gelernter91]: He presumes that data is encoded like this: (birth, "Baby Doe", 12, 30, (12,01,1994)). Databases use labeled records: (Event=>birth, name=>"Baby Doe", weight=>12, mother_s_age=>30, date=>(day=>12, month=>1, year=>1994)). This is longer. But it is less error prone - a script is less likely to misread the date as the 1st of December or the mother's age as 12! This is the kind of data already in use.

        We can extract the following processes and ideas from data engineering:

        1. Three Processes: Normalization, Conceptual Modeling, Performance Engineering. These start to fill in the gaps in SWR, FD, FP, SA and SD discussed earlier.
        2. The ideas of conceptual model vs logical design vs physical implementation are parallel to the ideas of Semantics vs Syntax vs Lexical structures in DDD and compilers.
        3. A different kind of diagram from Data Engineering: The Entity-Relation Diagram--(ERD). A simplified form will introduced when needed (chapter 2, section 4.3 Formal Analysis).
        4. DBMSs let users ask a large class of ad hoc queries, without involving programmers or extra design work.

      3.5 Knowledge Engineering

        Brooks noted that Artificial Intelligence tends to produce solutions to special classes of problems [Brooks87]. Expert Systems, Knowledge-Based Systems [RothJacobtsein94], Case Based Reasoning [Allen94] and similar rule based models are some of the successes that can be described by the following DFD:




        The "Inference Engine" does not vary much from project to project. The data base has elementary facts and is kept up-to-date by traditional data processing methods. The "Knowledge Base" has rules that connect facts. The rules are developed iteratively with the help of one or more experts in the area being automated. In some systems(eg Prolog) the data base and the knowledge base are integrated by making the elementary facts propositions that have no presuppositions. When this kind of software drives a distributed object - then we get a "Software Agent" - in the limit, the environment includes the Internet and any private data accessible to the user.

        In this kind of software engineering the engineer is focused on modeling the real word (Data+Knowledge) not coding the inference engine or helping the user directly [Keller87] [PayneMcArthur90] [Deville90]. Notice that

        1. The user is able to create queries in minutes that in traditional MIS/DP systems would become part of the applications backlog.
        2. The knowledge base is non-sequential - the sequencing is determined by the inference engine.

      3.6 Dynamic Analysis and Design

        We saw above (section 2.4) that the Yourdon method has evolved to include State Transition Diagrams(STDs). Other methodologists also use dynamic models [For example: Robinson, Jackson, Hoare, Zave, Harel, Erdogmus & Johnstone, Ward, Berzins & Luqi 91, Hull et al 91, Shlaer & Mellor 92, ...]. These show how to base the design of persistent dynamic objects on the behavior patterns of real word objects by using STDs, Regular Expressions, EBNF, or recursive definitions [cf „CSP, „CCS, Erdogmus & Johnstone , Baetan 90]. These are called Entity-Life- Histories (ELHs) in JSD and SSADM [„JSD, „SSADM in chapter 9]. The code for a module/object/process comes from something in the real world - without hand waving - even when there is no data defined [Cameron, Erdogmus & Johnstone]. In JSD, database attributes are also deduced from the state spaces of dynamic objects.




        Dynamic Analysis and Design(DAD) has many published examples (See chapter 9 for bibliography under „DAD ) but are not recognized as general methods. DAD complements Structured Analysis and Design[„SSADM], Entity-Relationship-Attribute models(ERA) [Botting86a], DFDs [Hulletal91], and expert systems (Section 3.5 above). Jackson notes that JSD is best used for problems where there is an environment to be monitored and reported on [Jackson94]. Policies and business rules may be better expressed using a rule-based design [Poo91]. All DADs document the patterns of change in the software's environment before planning the software.

      3.7 Object-Oriented Methodologies

        Object based methods evolved from programming to analysis(p61 of [SharbleCohen93], p39 of [Kramer93], [Embleyetal94]. Fichman & Kemerer discuss whether the resulting methods imply a radical or incremental change [FichmanKemerer92]. Others question whether the resulting analysis methods actually contaminated by design and implementation considerations [Embleyetal95]. The methods are still changing [MonarchiPuhr92] [Kramer93].

        First Wave Methods

        Early methods started with an informal strategy to solve the problem and used this as the source of objects etc. [Abbott83] [Booch83] [Booch86] etc, hence BOOD, compare Chapter 3 of ,See [Goldstein & Alger 92 ]. Thus, instead of refining of steps in algorithms the early methods refine the objects in an algorithm. Experience teaching this method lead to the following conclusion:
          "The weakest step of the method is the informal strategy. As it is the means of finding objects, classes and operations, it is indeed the cornerstone of the method. Therefore, a bad informal strategy has a major impact at least on the ease of performing the further steps of the method and sometimes even on the quality of the solution. Giving strong technical advice for writing a good informal strategy is hard, due to its informal nature. We think therefore that better ways of discovering objects, classes and operations have to be found."

        The Booch method has been improved since this time [AppelbeAbowd95]. Song & Osterweil note that a reality directed method like JSD provides actions that help get this "BOOD" started [SongOsterweil94].

        Meanwhile others extended Structured Design(SD) methods to incorporate objects [Coad88] [ [Bailin 89, ...] , [McGregorKorson90] [ShlaerMellor92] [AndleighGretzinger92]. Instead of coupling and cohesion the idea of inter-object dependency has been defined and can be measured automatically [WildeHuitt92]. Peter Coad calls this "conascence" [CoadP92]. Simple dynamic objects or finite state systems are the heart of the latest version of "Cleanroom Software Engineering" [Hevneretal92] and other object-based methodologies [ShlaerMellor92] [Fayadetal93] [Lang93] [BalawskiIndurkhya94]. These typically use state transition diagrams or Harel State Charts to document dynamics. These don't take advantage of the insight that objects are easier to design if conceptually parallel structures are designed as concurrent processes[„JSP, Robinson 79, ELH's in „SSADM, „JSD, Smith CU & Williams LG 93, Embley et al 94 p27]. Other "objectivists" [GoldsteinAlger92] incorporate parts of Data Engineering(see section 3.5 above) - such as some form of entity-relationship conceptual model [Embleyetal94]. Inheritance, generics and other object-oriented techniques tend to appear late in these methods.

        Second Wave Methods

        Others methodologists recommend using scenarios [Gladden82] or use cases [Jacobson] that are analyzed into the services needed to fulfill them. Depending on the methodologist, scenario may be (1) a DFD like picture showing a number of objects collaborating to fulfill a responsibility, (2) a script listing user actions, (3) a time-line diagram showing messages being passed between object [Jacobsen] [GoldsteinAlger92], (4) a tree [Hsiaetal94], (5) a grammar [Hsiaetal94] or even (6)a video film [Gladden82]. A narrative scenario or script is a natural language paragraph, conversation, or piece of text that states the services that are required by a user Object Behavior Analysis (OBA) [RubinGoldberg92]. Scripts are like the Logical Function Descriptions and `Logical Data Access Paths` used in the SSADM[See sources for „SSADM in Chapter 9, also cf Lustman 94]. Even without objects, scenarios have proved an effective way of provoking discussion and development of requirements [PottsTakahashiAnton94] and have been an informal but vital part of design documentation for many years [Carroll94]. Propagation patterns serve a similar purpose [LieberherrXiao9394]. In this book I will often write one or more narrative scenarios underneath a DFD to help explain some the possible ways it could be scheduled.

        The services that appear in the scenarios or requirements are classified into responsibilities. A responsibility is a piece of intelligence that will be encapsulated in an object. An object's "intelligence" is (1) the knowledge it maintains and (2) the actions it can perform [Wirfs-BrockWilkersonWiener90] p61. An object has responsibilities to provide services to its clients, by using its servers. Together the clients and servers are known as collaborators. A collaboration in some methods is developed into a mutually beneficial and constraining contract between client and server [Wirfs-BrockWilkersonWiener90] [Meyer92]. Responsibilities and collaborations are often allocated to CRC(Class-Responsibility-Collaboration) cards that describe possible classes (Samples in figure 2, page 65 of [McGregorKorson94]. These are modified until a stable structure emerges [Wirfs-BrockWilkersonWiener90] [Jacobsonetal92] [GoldsteinAlger92] [McGregorKorson94], and so on.

        Architectural Patterns?

        Slowly it has been admitted that a higher level of structure than that of classes and objects is needed as the unit of reuse. These tend to be families of collaborating classes (specific or generic). Some call these "object-oriented frameworks"[Talligent]. Others call them "design patterns" [Coad92] „PATTERNS in bibliographies. Gamma's work has lead to a significant paradigm shift [Gammaetal94]. Most of the patterns improve reuse by using more code. Like the functional methodologists, the Pattern Community have turned to architects(Christopher Alexander, in this case) to guide their thinking and so are developing "A Pattern Language" for objects [Booch94]. There is a therefore a new research field in software architecture[see „ARCHITECTURE in the bibliography] that hopes to improve productivity by aiding the reuse of all the materials prepared in a software process. Meanwhile, quietly, with and without objects, computer aided code reuse is starting(1993-95) to pay off for some early adopters [Adhikari95b].


        Several comparisons of the leading traditional and object oriented ideas and methods have been published [Bucken93] [FichmanKemerer92] [MonarchiPuhr92]. Song and Osterweil include non-traditional JSD in their comparisons

        [SongOsterweil9294]. Smith & Williams compare a functional design, a DAD design [Sanden89b], and their own object-oriented design(based on ERDs, DFDs, STDs, and performance engineering) - they conclude that: (1) All three designs are free of performance problems, (2) the DAD approach gives results like Smith and Williams's Domain Analysis, and (3) Smith & Williams's design has more generic components [SmithCUWilliamsLG93]. Another experiment showed that a simpler design, as measured by a half-a-dozen metrics, was produced by using responsibilities rather than using DFDs and ERDs to guide design [SharbleCohen93]. None of seven leading methods provide full support for all the ideas used to describe objects [Bucken93]. The methods have introduced contradictory and unintuitive notations (introduction to chapter 6 of [GoldsteinAlger92)] [Embleyetal95] p20, examples: [Nerson92] and [Coad92]. The notions represented are not clearly defined or standardized [BalawskiIndurkhya94] Apparently most object oriented methods rely on intuition and back-tracking to design large systems of objects [MonarchiPuhr92]pp40-41. According to Dedene & Snoek none of eight methods they report on check for deadlocks or other problematic behaviors [DedeneSnoek94].

        The proponents of object-oriented methods are aware of the contradiction between the view of a problem given by structured methods and the views as seen by objects [FichmanKemerer92] [Kramer93]. Some see object-based methods as an alternative to earlier methods [Henderson-SellersEdwards90](p145) and some reject all parts of older methods [Berard89] + Bowles comments on p42 of [Kramer93]. Practitioners with long memories notice that Object Oriented Methods were in use before the invention of "Structured Programming".

          "Of course I like it; it's what we did 30 years ago.[...] It isn't new, its just better implemented" [Horch95] page 118.

          "Remarkably enough, the abstract data type has reappeared 25 years after its invention under the heading 'object oriented'." [Wirth95] p66.

        Some report that DFDs complicate the OO process [MonarchiPuhr92(p45)] [Fayadetal93] and others that a major problem is mapping DFDs and dynamics onto particular objects [MonarchiPuhr92] p45. Similarly, an enterprise-level ERD may not map precisely into a collection of classes for an object-centered user-interface(page 147- 148 of [Capperetal94]). However, Fichman & Kemerer argue that objects do not give a system wide or end-to-end view of a design and Dedene and Snoeck note the lack of a definition of overall behavior in most methods (p29 of

        [FichmanKemerer92], Figure 2 page 52 of [DedeneSnoek94]. Perhaps, the clashing views(Objects vs Flows) need to be documented separately and connected by some linking mechanism. Gary Simmons & John Thomson have extended Smalltalk so that the objects encapsulate their own documentation of the problem domain and the user's needs for example [Rettig93].

        Object-oriented methods and tools are still(1995) evolving p38 of [FichmanKemerer92] [Kramer93] and even breeding (see the unification of Booch, Rumbaugh, Jacobson et al in 1994-1996, and HP's FUSION method]. Intuition, experience, and/or backtracking are needed in current versions. Catalogs of standard patterns are being developed [Gammaetal94] [Nerson9??]. Finally there seems to be several concepts used in object-oriented thinking that have no agreed meaning - leading to practical confusions and philosophical debates [BaclawskiIndurkhya94]. One point is clear - all recent object-oriented methods start by modeling ideas surrounding the software and this model is mapped into the program design:




        According to Goldstein & Alger this approach works best when

        1. The "Objects" have a physical and visual reality that is understood by everyone,
        2. The model is simple enough that there are few alternatives to it,
        3. each background assumption is either irrelevant or obvious


      3.8 Formal Modeling

        Formal and/or mathematical models have long been used to design software. Indeed the first languages did no more than FORmula TRANslation. At first, computing seemed to be nothing but the trivia of coding a solution or algorithm found by algebra [Jackson94]:




          Typical Scenario. In "Imperial Chemical Industries" in the 1960's the Sodium Production group asked Computer Services to find the optimal rate of flow of mercury thru their electrolis bath. A young co-op student (me) was tasked to: work with them, develop a model, solve it, use the mainframe to implement the solution, and provide results that let them choose the best flow. I talked with the people and developed a special kind of model: (a Partial Differential Equation), worked out how to solve it (the answer was in a text book), translated the solution into code, tested it (worked second time), modified it to give graphic output(I was bored), showed it to the user, who gave the correct parameters for the model, rejected the four decimals of accuracy because the graphics were more help. This version of the program helped them answer their question.

          Note 1:

            The scenario shows a process that was both mathematical and rapidly prototyped. I use the term "formal model" for what in academe is called "formal methods". I do not want this process confused with what DeGrace(and some other recent authors) call a "formal method" namely: SA+SD+rigid sequential process [DeGraceStahl90]. This is better named a "formal process" - the bug-bear of artistic programmers in the '90s. A formal model is one expressed using formulae (plus diagrams, text, etc) that can be manipulated by using documented rules. Producing this model is a creative process. Working with the model demands both rules and creativity.

            This idea for this process dates back to Descartes [Grabiner95]. The more precisely the formulae and rules are stated and applied, the more rigorous the process becomes. However this may not increase the odds that the model fits reality! Rigor (in this sense) reduces the chance of errors in solving or simulating the model. This reduces the error proneness of the whole process however. It can be combined with other techniques that reduce the chances of unreal models of the problem or situation.

            Imre Lakatos's little book "Proofs and Refutations" makes it clear that rigorous formal models of a problem area do not come first. His case study of the development of ideas and models concerned with a single formula about polyhedra over a 100 year period indicates that proposing a mathematical model is an invitation to search for counter examples as well as proofs. Further each counter-example can be used (along with a proof) to further improve the model. Ultimately one gets a more certain statement that is expressed in more rarefied terms. I believe that useful formal models will also follow much creative and critical thinking. Once formulated they can be taught.

          Note 2: the chart above avoids the "fantastic claim" [Gelernter91] p41 that "programming is mathematics." Programming is only the right hand part of the above DFD - "real programmers" disdain the left hand half of the process.

        An alternative (but similar) DFD is "The modeling life cycle" by Krishnan et al [KrishnanLiSteler92], Figure 2. Newer tools have started to automate parts of the modeling and solving process in the above DFD: example [Kant93]. Non-numerical problems can be done the same way. Early examples occurred in the nineteen-sixties [BraffortHirschberg63] [Fox66].

        Several well known techniques are all applied mathematics: BNF(section 3.3 above) was formal. Codd's n-tuples(section 3.4 above) are from set theory. Logic (section 3.5), regular expressions(section 3.6), and finite state machines(section 3.7) are all formal models. Modern mathematics has many non-numerical systems ready to be used [Mills75] [WoodcockLoomes88] [Ince88a] [Spivey89] [Millsetall89], . . ..

        It is difficult for a non-mathematician to find ready-made, useful mathematical systems. However the formal systems needed are simple: Symbolic logic and "discrete mathematics" are taught to Computer Scientists and others interested in programming [MannaWaldinger85] [Millsetall89] [Denningetal.89a], . . . [Zelkowitz90] [Gries91] [Fekete93a]. Set theory is even appearing in books on Analysis and Design [Yourdon93]. The SSADM Logical Data Structure (LDS) is a picture of a collection of sets and mappings[„SSADM]. It can be used to model paradigmatic problems such as the "lift" problem [Botting85b]. Automata theory has been proposed as a way improve Structured Analysis and Design for transaction based systems [Lustman94] and scenario analysis [Hsiaetal94]. Ideas from formal linguistics turn up in $DDD, $DAD, and OOD(see previous sections). Even Lotfi Zadeh's fuzzy logic is itself formal, not fuzzy [Zadeh94] [MunakataJani94]. Wand and Weber have proposed the mathematical modeling of information systems{10} [WandWeber90]. Some call data-directed and structured methods "semi-formal" [Wing90a] [Fraseretal94](Table 2 p79). MERODE integrates regular expressions, data engineering and objects into a formal method [DedeneSnoek94]. There is clear evidence that only sophomore level mathematics and logic is needed [Jordahl91b] p100 [Saiedian93] [Hayes93]. Computer science is developing formulae to model systems, requirements, specifications and programs better [Hoare93] [GriesScheidner93] [Hehner93] [Parnas93].

        Using formulae plus text and diagrams leads to some important advantages:

        1. Using formulae means that one can calculate the properties of the design] [DelisleGarlan90].
        2. Formality promises better quality control ] [Martin85] [Meyer85] [SarkarSarkar89] [Nicholl90] [Parnasetal.90] [Spivey90] and pp207-211 of [Hayes93].
        3. Prototype code can be automatically synthesized from formal specifications [Harel].
        4. Prototypes can be safely optimized (pages 88-90 of [Hoare87] [Hoareetal.87] [ErdogmusJohnstone90] [Harel92].

        There are many notations including(alphabetically): COLD, FRORL, Larch, LOTOS, PAISLey, Spec, Swarm, UNITY, VDM/VDL, Z, etc..

        Formal methods have been used in practice for nearly 30 years: In the 1960's IBM used VDM/VDL to define Algol 60 and PL/I. In the 80's IBM used Z to re-engineer CICS(part IV of [Hayes93]). UNITY has been used to formulate the I/O subsystem of the GCOS operating system [Staskauskas93]. Combining DFDs+Data Dictionaries+VDM is effective [Fraseretal91]. Another effective mix is DFDs+Data Dictionaries+Prolog [Keller87]. A frame-based logic(FRORL) has been shown to be sound and complete way to formalize object-oriented ideas [TsaiWiegertJang92].

        Indeed some formal methods have undergone the precise process that Robert Glass promotes: "A researcher working alongside a practitioner, being open to adjusting and improving ideas" [Glass94b] p46. As part of a joint university (Oxford University) and industry (IBM Hursley) project sponsored and rewarded by a government. Anthony Hall and later Bowen and Hinchey listed the following facts gathered from such joint projects:

        1. Formal models are very helpful at finding defects early and nearly eliminate certain classes.
        2. They work by making you think very hard about the system you propose to build{11} .
        3. They are useful for almost any application and for both hardware and software.
        4. The notation is based on mathematics and so at a higher level than program languages.
        5. They can decrease the cost and time of development. However estimating project time is difficult.
        6. They can help clients understand what they are getting.
        7. They have tools and are supported
        8. They do not replace traditional methods
        9. They are not always the methods of choice for all parts of a project
        10. They are being used successfully on practical projects.

        [Hall90] p 19 and [BowenHinchey95b]

        Another survey [GerhartCraigenRalston94] suggests that formal models can be used successfully in Domain Analysis and in Re-engineering to provide better assurance, better communication, and evidence of best practice. The methods also support reuse[chapters 4 and 5 of Schafer 94].

        Later experiences on real projects (successful and unsuccessful) shows that formal methods are best used along with traditional methods (including quality control, testing, estimation, documentation, reuse,human-computer interface design, ...) [BowenHinchey95a] and [BowenHinchey95b] but compare with [Kemmerer90]. SAZ is the result of combining SSADM(see later and chapter 9) with Z for example. Successful projects have little dogmatism, have access to expertise, and avoid over-formalization [BowenHinchey95], but compare with [Zelkowitz95].

        One question that must be answered is why it might be worth moving from concrete models to abstract ones. Concrete models are either computer oriented ( like "programs", "logical data structures" or "object hierarchies") or natural language "surface structure". It might be thought that the use of abstract models is an academic affectation. However, many methodologies that have been and are used in practice explicitly ask their users to abstract the essence of a situation from any accidental details. Data engineering, SSADM, SADT, OMT,... and so on all stress the importance of moving away from current implementation and thinking before looking for new solutions.

        Recent papers take abstraction this further and suspend the traditional distinction between program and data. One proves that this lead to a significant reduction in complexity and the cost of maintenance [LongDenning95]. In another case a complex and evolving game was implemented this way [Sanders95]. Evidence for the value of abstraction as a cost saving device will be given in most of the later chapters in this monograph.

        Formal models have not yet affected most programmers - one study turned up only one programmer that used them [Lammers86]. Indeed when a formal model is developed for existing software it raises questions that can only be answered by testing the software - the answers have never been documented(pp 191 & 218 of [Hayes93].

        It is time consuming to prepare mathematical documents (See introductions to [MannaWaldinger85] [WoodcockLoomes88] [Ince88a] [Spivey89]. Also papers: [Rous90] [RushbyHenke91] [FieldsElvang-Goranson92]. The notations need to be chosen with care] [BowenHinchey95a] [CreveuilRoman94]. Formalists who try to apply their ideas in practice usually are forced to develop graphical and/or tabular formats[See „GRAPHIC and „TABLE in Bibliography in Chapter 9]. This book describes a way to make such mathematics less expensive, more accessible, "look and feel" more like programming, and graphical/tabular for clients.

      3.9 Conclusions

        The methods of sections 3.1 through 3.8 above provide the rules that were missing or implicit in the methods of Section 2 (SWR, FP, FD, SAD, Information Hiding,...Objects). They share a common form:
          "[Mathematical] and semantic models are useful for analyzing and structuring [problems]. The structures produced are then converted to a logical model based on some [implementation system], which in turn is implemented by commercial [...] software[...] Finally, the implementation may be adjusted, taking into account [...] requirements and physical structures available to the designer." .CLose.Set (Paraphrase of paragraph from [Hawryskiewycz91](see section 3.4 above)

          These are therefore reality directed methods. Formal models, EBNF, Data Directed Design, Dynamic Analysis and Design, Expert Systems/Knowledge Engineering, Data Engineering/Information Engineering, and the later Object-Oriented methods all force the practitioner to think about the problem domain in ways that are close to the user and client. For example Berzins et al show a context DFD of a user checking the spelling of a document(Fig 6) and instantly interpret it as a software design: the user interface is in one module and the spell checking logic is in the other module [BerzinsLuqiYehudai93] p446. Many think this desirable [Paulson89] [Lawson90] [Sharon91] [Tayntor90] [Nielsen92] [Aranow92] [Bachman92] [SlaterCarness92] [Zave93] [Rettig93](p23). Alan Davis notes that this is what Dijkstra calls "minimizing the Intellectual Distance" [DavisA94c].

          Importantly, a focus on the real world that contains the problem answers the key problem: That it is unlikely that a single method will fit all problem domains [Jackson94] [GalssVessey95]. The above picture shows that one always has to look at the domain and then can choose suitable design methods. Any of the specialized reality directed methods could be used. Indeed there is nothing to stop the designer from recognizing a simple top down clicheŽ and using it.

          Focussing on reality generates systems where

          1. Ad hoc queries can be programmed by the users using a DBMS,
          2. update processes(via JSD/SSADM ELHs) can be developed simply,
          3. Programs that process data can be designed by semi-formal data directed methods(eg JSP),
          4. Nonsequential and sequential structures are used, and the structure is traceable to something other than undocumented preconceptions. .CLose.Box Thus "Reality" complements the technology of section 2. Sections 2 and 3 together, let us have a structured, modular, conceptual model of the problem that guides logical designs and physical implementations of solutions [Lutz92]. The structure of the solution (the design or specification) is related to and yet decoupled from a documented model of the problem:




            A change in the system and/or the user leads to a change in the model. The design process uses the model to solve new problems or simplify the system by using available technology giving an updated specification. Implementation refines the specification changes into changes in the system. A good software engineer can restart and/or redo any of the processes as demanded by the situation. The parts of the DFD can be distributed to different people/teams, but then effort is needed to avoid deadlock and/or inconsistencies between the model, specification, or system(configuration management, documentation, SCCS/RCS,... [Haque95]). As in all DFDs in this section, the parts can all be active and all can encapsulate some intelligence(artificial or real).

              Scenario 1. When the problem area is well understood and changing slower than the time to complete a cycle then the processes can be scheduled to run in sequence(Waterfall) [HannaM95] [Bond95].

              Scenario 2. Incremental development means that the DFD can have several threads of execution running at one time as long as the system, model, and specification are modularized so that changes made by one thread do not disrupt the rest.

              Scenario 3. "The MS Process" When developing a mass marketed product then a set of partial models, designs and systems- each set forming a single release, are rapidly developed to get fast feedback from a sample of users (often the developers themselves). Features are added and subtracted from the model, design, and implementation within a day. Bugs are acceptable in new features. Slowly, as the product evolves, the pool of users expands: The developers, the beta testers, the early adopters, the mass market, the die-hards,... [Bond95] [Keuffel95b] [Sanders95a].

              Scenario 4. The system has been running for 10 years. Every now and then the user requests a change. The request is analyzed and compared to the model. If the request leads to a structural change then parts of the model may be modified, normally however the request fits the model's structure and is rapidly mapped (by restarting the design process) into a change in a part of the specification, which in turn triggers the implementation of the change in the System itself. Notice that this proceeds rapidly because the "overhead" of documenting a model and specifications guide the process.

              Scenario 5. The MIS department in a large corporation that manufacturers widgets is asked to produce software to help the acceptance of orders from customers, the preparation of invoices, picking lists, deliveries, and the extraction of payments using a new client/server LAN system. The initial analysis ("feasibility study") constructs a model that has many unknowns in it - the user interface and the platform are not clearly defined. This triggers of the need for risk management: The model documents the risks and the development team proposes three strategies to handle the risks. (1) Part of the model will include a running "mock up" of the user interface and the client managers and users will review and help develop it. (2) The clients will spend some time with the development team brainstorming scenarios of how the system will be used. (3) The logic of the business will be in the model and the design of this part of system will be specified in an platform independent (abstract) form that the client can understand and review vs the scenarios. Thus analysis and design will proceed in parallel, but the implementation can be delayed until the platform is chosen - this may even be outsourced.

              Scenario 6. I was working with a friend and colleague who did DP consulting. He was working with a small manufacturer of floor coverings. Their needs were simple: the sales clerk would be taking in orders, and printing them out,etc. while in a different room the CEO kept an eye on inventories and the like. The model was simple and design obvious since there was a standard order/billing/inventory program available for reuse Implementing the simple program to let the CEO get his data was a matter of hours. However, this was in the 1970's when timesharing was a new technology. It took three weeks to get past computer company's management and sales-force to the engineer who could give us the right version of the operating system. The computer company has since been absorbed by other companies.

              Scenario 7. The customer hot line gets 200 calls asking how to change the "hour glass" cursor to some other form in your new GUI based OS. Analysis shows that the model has no mention of the cursor at all - it appears that it was never discussed in the focus groups and never appeared in the prototype screens. The model is changed. In this organization Design is ad hoc and out of control. The specifications mentions a "the WFSTH cursor" - this apparently means "Waiting For Something To Happen" . Tracing this into the implementation takes time but it appears as a hard coded void function called "FunHG()". The implementation is changed, and crashes the system... because one part of the OS does not uses the function.... Ultimately the system is debugged and released but nobody bothers to update the specifications.

            Scenario 7 shows some problems with this DFD! It will be revised after detailed study of (yet) more methodologies.

            The wise engineer usually constructed a model of the problem separately from a specification of the solution. The model is about "what is" and so includes "Requirements" because these are a part of the client's world [LindlandSundreSolvberg94]. By calling it a model we can distinguish it from specifications and designs [ayadetal94] [Embleyetal95]. As a model of the problem (and its domain) is clearly more than a set of abstract functions [Siddiqi94] or an initial design of a solution. Will Tracz calls this the problem space and distinguishes it from the solution space] [Tracz95]. Solution: Design and Implementation are technology driven rather than problem or domain driven] [Feijs93] [RomanWilcox94], ....

            Some believe that the application of non-sequential construction methods and the idea of reflecting the "real world" will lead to a way to use computers that will completely change the way software is developed. Michael Jackson's methods (JSP, JSD) have always included implicit or explicit concurrency and lately he has made this even clearer: "My conclusion is that we should work towards the kinds of implementation infrastructure that would support multiple, superimposed, architectures" [Jackson95a]. Gelernter's vision of "Mirror Worlds" is a popular account of what becomes possible with concurrent systems [Gelernter91].

            My next step is to apply the above model to software engineering itself. The IEEE has a task force developing a new discipline and a common conceptual model for computer-based systems engineering [JacksonKLavi93].

            In section 4.1 (next) [ 01_4.html ] I search for models for a "Software Life Cycle". Then I look for the data that is used in modeling software(section 4.2). This leads to a detailed DFD of the processes and data in an idealized software engineering system(section 4.3 ... ). This lets us (1) revise the figure above, so (2)propose improvements to software engineering. At a future time the revised system, should be itself re-applied - bootstrap fashion - to current practice, and so on...

      . . . . . . . . . ( end of section 3) <<Contents | Index>>

    Formulae and Definitions in Alphabetical Order