This set of notes focusses on the Rules and Procedures that make a system work...
The rules we put into are systems can kill people. Here is a classic example
[ Woman-left-to-die-after-999-ambulance-blunder.html ]
from the London Telegraph newspaper.
An algorithm is a detailed procedure -- stated in enough detail that it
can be executed on a computer. Here is a rare event -- an algorithm in the
news
[ story.php?storyId=114259241 ]
(How Do You Find A Job? Ask The Algorithm).
However you can expect more news items on algorithms.
- Alison George interviews Kevin Slavin in The New Scientist 20 Aug 2011 pp28-29.
- Algorithms are an invisible part of the 21st century and carry a spurious authority.
- Gives examples of algorithms doing ridiculous things.
. . . . . . . . . ( end of section Algorithms in the News) <<Contents | End>>
The BBC
[ technology-14306146 ]
also picked up on this TED Talk by
[ kevin_slavin.html ]
in July 2011.
All enterprises have there own way of doing things. These are
often very well defined. In many cases they are complex legal requirements.
Your new software etc. must fit with the
rules and implement the procedures.
One problem that a software developer or development team must solve is
being able to capture rules and procedures. Luckily there are many tools you
can use. The best of these are covered in these notes.
The same tools/techniques can be used to plan what parts of a system will
have to do -- specify the requirements for hardware and software.
Notice that a detailed model can include assumptions about the technology
used to implement the model: this is a
Physical Model.
It can also be independent of the technology and so have a longer
useful life -- these are called
Logical Models.
In these notes mentally classify each technique or tool as best for
either logical or physical models.
Here is a brief article describing some complex manual procedures
that solve an important problem. Notice how the physical systems
evolve but still meet the logical requirements:
- Martin Campbell-Kelly
- Historical Reflections: Victorian Data Processing
- Commun ACM V53n10(Oct 2010)pp19-21
[ 1831407.1831417 ]
- =HISTORY DATA PROCESSING LOGICAL vs PHYSICAL BABBAGE CLEARING CHECKS USA CHEQUES UK
- Shows an efficient solution to a problem in the form of a multi-agent procedure.
- Traces the evolution of Bank Clearing Houses in the UK (as documented by Babbage) and then in the USA.
- Argues that similar algorithms are still used in modern information technology.
There are four skills that you need what ever techniques you use. (1)
Discovering the variables (if any).
(2). Defining the possible scope for the rules and writing rules/procedures
that fit these possibilities. (3) covering every possibility --
this is called completeness. (4) giving only one response for any possibility
-- this is know as consistency. (5) Getting the rules right -- this
is correctness.
You will find that writing correct, complete and consistent rules/procedures
to be be harder than you might think.
What do you make of this description:
[ honors.html ]
, is it consistent and complete?
Here is a complex manual algorithm
[ f1040sa.pdf ]
how do you translate it into a more manageable form ready to program?
How do you know that the program is correct?
A certain department store had a complicated scheme for calculating discounts.
Here are the rules:
- RULES::=following
Net
- (R1): If a person has a club card and today is a Tuesday then the discount is 10%.
- (R2): If a person has a club card and today is not a Tuesday then the discount is 5%.
- (R3): If a person is a senior and has a club card then the discount is 10%.
- (R4): If a person is a senior and has not got a club card then the discount is 5%.
- (R5): If a person is not a senior and has not got a club card then the discount is 0%.
(End of Net
RULES)
I'm glad that I don't have to program these discounts into the Point Of Sale
terminals! The RULES above have a problem -- they are inconsistent: sometimes they
give two discounts to a person (exercise: when?).
It is also difficult to find out if the rules are complete -- is every
possible case covered?
To express complex rules and conditions we need precise notations: decision tables,
and/or tables, logic trees, Karnaugh maps,
activity diagrams/flow charts, mathematical logic, ...
plus other techniques discussed in
[ ../cs556/ ]
Formal Methods. Some of the best known are discussed below.
Formalizing and checking procedures and
rules is time consuming but essential.
Once found, there are many ways of resolving inconsistencies and
completing the rules. However these should not be invented by a programmer
or an analyst -- you need to talk to the people involved in the
system -- the stakeholders -- to resolve inconsistent and incomplete rules
and procedures.
All rules refer to things that vary. Discovering them is important.
Defining their possible values is vitals. When they are Boolean
(either true or false) they are called conditions.
For example, in the RULES above we have
six obvious Boolean variables.
- the person has a club card.
- today is Tuesday.
- the person is a senior.
- the discount is 10%.
- the discount is 5%.
- the discount is 0%.
Another formulation has three Boolean variables and a numerical discount
- the person has a club card.
- today is Tuesday.
- the person is a senior.
- the discount is either 0%, 5% or 10%.
Giving them short names is a helpful shorthand.
It also helps to document that part the variables play in
the rule you are working on: some variables are given and others are goals.
The givens are also known as inputs and the goals as outputs.
For example
- GIVENS::=following
- the person has a club card.
- today is Tuesday.
- the person is a senior.
- GOAL::=following
- the discount is either 0%, 5% or 10%.
The rule will either show you how to work out the goals from the givens or it, at least
tells how the goal variable is constrained by the given variables.
Some notations (below) force you to express every
variable as a Boolean variable (values True or False). The better notations
allow a variable to have other values. Numbers are very useful for many rules.
The can represent money, discounts, heights, counts, widths, velocities, and so
on.
Rules that appear in systems -- also known as constraints can be -- can
be classified as being natural because they are true automatically.
Other rules are required -- they must be enforced by the system we are designing.
These are abbreviated NAT and REQ. The system itself will enforce certain rules.
Call these SYS, and then we must have
- if NAT and SYS then REQ,
for the system to meet the requirements.
Notice that when to systems are connected and work together then what is
REQ in one is NAT in the other! In other words each system is REQuired
to provide the NATural given assumptions for the other. For example
suppose that process INPUT is outputting numbers to a process called
SQRT, then INPUT is REQuired to provedie numbers that are not
negative,
but SQRT can then assume that it is given nonnegative numbers. Something
like this happens on every data flow in a DFD -- the source is required
to meet the natural assumptions of the destination.
This distinction comes from the Software Cost Reduction project of the
late 1980's onward. This
[Heitmeyer02]
is a link to a citation to Connie Heitmeyer history and summary of the
techniques.
Choose the notation and media that fits your needs now and in the future. This
means being prepared with several techniques/tools. Here is quick list
of the techniques covered below.
- Text
- User Stories
- Scenarios
- Diagrams
- Plans
- Activity Diagrams
- Earlier Flowchart-like notations
- Pseudocode and Structured English
- Symbolic Logic and Mathematical Formula
- Tables
- Truth Tables
- Karnaugh Maps
- Function Tables
- Decision Tables
- And/Or Tables
- Extended Entry Tables
- Tree Diagrams
- Prototypes
Text is natural. But it is hard to read and difficult to write clearly.
Natural language is also ambiguous.
There are few tools for analyzing text data. It is almost impossible
to read a complex set of rules or a description of a complex procedure
and spot errors and inefficiencies in it. There are horror stories of
teams faced with a thousand pages of densely written rules and procedures
that have to be embedded in their software.
On the other hand, you should include text in requirements documentation to
explain the thinking that goes into the more formal definitions, tables,
diagrams, and formulas.
Writing a one paragraph description of a feature or property that the
user has asked for is a
a popular and agile requirements technique. A
"user story" is a short, one paragraph description,
of something that the user wants. If it can not fit on a 3><5
card then it is too long. If it has lots of technical details it
is not a "user" story.
User stories are simple enough not to need much
coverage here.
You can get details from the Wikipedia, link below.
A scenario is another simple text based technique:
Scenarios
are a numbered list of actions by various actors. They give a single possible
execution of a process. They should not be used to define algorithms
or logic. They just show one possibility not all the variations. loops, etc.
. . . . . . . . . ( end of section Text) <<Contents | End>>
When you have a set of activities that must be completed but you
have no required order then you can use Network Planning and
the Critical Path Method
[ c3.html ]
to document, plan, manage, and control them.
Network plans are essentially special
[ Activity Diagrams ]
(below).
Sometimes you have a complex decision to capture in your
requirements.
Tree diagrams are a popular way of mapping out a complex sequence
of conditions. You can use an activity diagram or a notation
like this
If you want to learn more you can use the Wikipedia. I don't
require you to use these in quizzes, finals, or exercises. You may find
them helpful in presenting your project. You may choose to use them in exercises
and tests if you think they will help.
Tree Diagrams are essentially specialized
[ Activity Diagrams ]
(next).
Activity diagrams are one of the thirteen UML diagrams.
They are an excellent way to record existing procedures,
new procedures, programs, and processes. They can show both complex
logic and complex interactions between activities in a process.
Here is an example of these symbols in use describing a
procedure for handling Customer Orders.
Here is the same example procedure with
swim lanes
that allocate the activities to
different parts of the system or to different actors.
There are a lot more features available to you in the standard,
for example here is a link to an example of
a data store.
Notice the <<datastore>> in the above activity diagram. This indicates the
systems database. The <<select>> stereotype indicates the kind of access that
should used.
Here is an image of the symbols we used back in the 1970s for flowcharts.
The Dia and Visio graphic programs provide ready-made palettes of these
symbols when you need to produce a final/pretty diagram. Or when
you know the procedure is going to change and so want an editable format.
The simplest set of symbols is the ASME standard set.
It has precisely five symbols:
operation, delay, inspect/test, store/retrieve, and transport.
An efficient process minimizes everything but
operations!
They always have a special START node to get the activity/process
started. They usually have a number of STOP nodes to terminate
the activity.
Do not confuse these. A flowchart shows the sequences of activities.
A DFD shows parts of a system and how they are connected. A flow chart can
have decisions and branches but a DFD does not. A flowchart has a start
and a finish, a DFD is running all the time.
Use flow charts and activity diagrams to document the detailed behavior of
processes inside a DFD -- if the process is complicated and interesting.
Sometimes a business will have rules that determine the order in which events
must occur. This is subtly different to describing the sequences of activities.
Here we want to be sure that someone does not draw a pension before they
retire or after they are dead, for example. The natural way of handling
these rules is to define states (nodes in the diagram) and the events that
change states (arrows with [conditions] and event names on them).
The UML provides a type of diagram -- a State Machine or State chart. It
shares a lot of features with an activity diagram. The key difference is
the appearance of events on the transitions. Here is an example from the
draft UML2.0 standard.
These state machines are studied in great detail in Computer Science theory.
Surprisingly they are (1) useful and (2) supported by some libraries, for example
[ http://www.skorks.com/2011/09/why-developers-never-use-state-machines/ ]
In this class they won't appear in quizzes or the final, however you
may find a use for them in the projects. They briefly appear in
CSCI375 (requirements analysis).
Notice that complex decisions sometimes appear in flow charts and activity
diagrams. In activity diagrams the conditions are written in square brackets
on the transitions: [x>5], [x<5], [x=5]. In flowcharts the condition is
written inside the diamond and the transitions are typically labeled
"yes" and "No" or "True" and "False". Sometimes the conditions can be
difficult to document. Give them a meaningful name ([no discount]) and
use the other techniques
described here (Boolean expressions like [Senior && Tuesday] to document the details.
. . . . . . . . . ( end of section Diagrams) <<Contents | End>>
In the 1960s Computer Scientists proved
that we only need three structures to express any algorithm.
They are:
Sequence, Selection, and Iteration:
Programmers have since learned to use them and to express required
algorithms and program designs using a structured version of English.
Pseudocode (sue-do-... not suede-oh...)
is an imitation structured programming
language. It typically mixes a natural language and structures like
IF-THEN-ELSE-END IF and WHILE-DO-END WHILE.
Here is a sample of traditional Pseudocode:
IF person has club card THEN
IF today = Tuesday THEN
discount := 10%
ELSE
discount := 5%
END IF
ELSE discount := 0%
END IF
The use of "=" is not like it is in C/C++/Java/PHP/... but more like that
writing conditions
the 1960s.
In the above ":=" indicates that a new value is assigned to the variable on the
left hand side. The "=" is a test for equality. Using "==" is an error.
Notice that every "IF" has a single matching "END IF". An IF may or
may not have an "ELSE".
Here is an example with a WHILE loop.
(Algorithm_F):
factorial := 1
WHILE n > 1 DO
factorial := factorial * n
n := n - 1
END WHILE
Every "WHILE" has a matched "END WHILE"
Notice that you can write and process Pseudocode with your favorite programming
editors. What makes it "Pseudo" is that you can not compile or
interpret it. On the other hand it has equally strict rules:
- Use capital letters to show structure.
- Every IF has a THEN.
- Every IF has an END IF.
- Every WHILE has a DO.
- Every WHILE has an END WHILE.
Here is the BNF syntax for
- Pseudocode::syntax=following,
Net
Not following this syntax in this course is an error.
Some rules are best expressed using traditional mathematics.
For example, in a game, a missile will follow a trajectory described by
Net
- v:Real=given, vertical velocity.
- g:Real=given, acceleration due to gravity.
- y0:Real=given, initial height.
- x0:Real=given, initial horizontal position.
- t:Real=given, time.
- x:Real=goal, horizontal position at time t.
- y:Real=goal, height at time t.
- y = y0 + v * t - g t^2 / 2,
- x = x0 + u * t.
(End of Net)
The formulas are precise and short. They may not be stakeholder
friendly. But some stakeholders, with a scientific background will expect you
to use mathematics. Setting out the variables is helpful and takes most of the time.
In this story a project was finihed quickly by expressing the user's problem
in mathematics and looking up the solution in a book.
In "Imperial Chemical Industries" (UK) in the 1960's the Sodium Production group
asked Computer Services to find the optimal rate of flow of mercury through
their
sodium electrolysis bath. A young co-op student (me) was tasked to work with
them, develop a model, solve it, use the mainframe to implement the
solution, and provide results that let them choose the best flow. I talked
with the people and developed a special kind of model: (a Partial
Differential Equation), looked up the formula solving the equation
in a text book,
translated the formula into code, and tested it (worked second
time). This took about less than 3 days (one test per day). But
I had a whole week before I met the clients again so I modified it to give
graphic output.
When I showed it to the
user he: (1) gave me the correct parameters for the model,
(2) rejected the results with four
decimals of accuracy -- he could only measure to 1 place, and (3) said the
graphics were more help.
I removed the numbers.... and the result was a program that solved
their problem.
Summary: by using mathematics and three iterations the whole project was
finished in 2 weeks.
In business most rules have conditions have "If .... then ...." in them
and so you need a way to express conditionals. Symbolic or
mathematical logic can help.
So the example RULES (above) we might be expressed in logic like this.
(the '@' indicates a true/false (Boolean) proposition)
Net
- p::@=given, the person has a club card.
- q::@=given, today is Tuesday.
- r::@=given, the person is a senior.
- s::@=goal, the discount is 10%.
- t::@=goal, the discount is 5%.
- u::@=goal, the discount is 0%.
And then the requirements might be written something like this
(r1): p /\ q => s
(r2): p /\ q' => t
(r3): r /\ p => s
(r4): r /\ p' => t
(r5): r' /\ p' => u
(End of Net)
Notice that this approach does not help the typical stakeholder.
More, it does not help the typical programmer either.
There have been experiments in using logic and discrete math
to describe the behavior of a system in abstract form and
then to analyze the model to see if the ideas work or to
search out unwanted behaviors.
We have tools that test logical statements for completeness and
consistency.
These are a part of
[ ../cs556/ ]
Formal Methods.
They have been used successfully to debug real projects.
The theory of data bases is based on Discrete Mathematics. The boxes
in a normalized ERD are just sets, and the lines are functions: many-to-one
relations or mappings. Each table in a relational data base is just a set of
tuples. The data base operations add and delete rows/tuples or
change items in certain rows. You can use the operations in set theory -- union,
intersection and complement to express some operations. Other operations
need either specialized symbols of "set comprehensions".
For example (Taken form "Software Development with Z" by J B Wordsworth 1992) suppose
we are keeping track of students who enrol in a class, take a test, and
may be awarded a certificate. Here I use my MATHS notation for set operations.
- Class_managers_assistent::=following,
Net
- Students::Sets=given.
- Enrolled::@Student, @ indicates a subset.
- Maximum_enrolled::Nat = given.
- |-|Enrolled| <= Maximum_enrolled.
- Tested::@Student.
- Passed::@Tested.
- Failed::@Tested.
- |-Tested = Passed | Failed.
Set theory `union' is shown above. "Tested students are either Passed or Failed'.
- |-No (Passed & Failed).
Set theory intersection is shown above. "Nobody is both Passed and Failed".
- |-No(Enrolled & Tested).
(End of Net)
- Initially::= Class_managers_assistent with { Enrolled = Tested = {} }.
- For s:Students, Enrol(s)::=Class_managers_assistent with {s not_in Enrolled|Tested. Enrolled'=Enrolled|{s}}.
The prime (') indicates the next value of variable.
- For s:Students, r:{pass, fail}, Test(s,r)::=Class_managers_assistent with following,
Net
- |-s in Enrolled ~ Tested, Set theory complement but not.
- |-Enrolled' = Enrolled ~ {s}, remove s from enrolled students.
- |-If r=pass then Passed'=Passed | {s}.
- |-If r=fail then Failed'=Failed | {s}.
- (above)|-s in Tested'.
(End of Net)
Using discrete mathematics clarifies our thinking.
It gives a very rigorous definition of how the data base
can evolve. It allows use to prove useful properties and helps us
get programs that are bug free. There are even tools that let you test whether a set
of requirements for a system achieve desired results.
However, this is not a popular technique in industry. If you ever need to you
can find out more at
[ ../maths/ ]
or search the web for the language Z ("Zed").
Tables are an excellent way of analyzing and expressing complex logical
rules. I've used them for 30+ years. They have the precision
of symbolic logic but they are also client friendly.
This section of notes
shows how the DiscountsR'Us RULES can be expressed using several tabular
notations. Some of these notations uncover the problems with the rules....
which?
These have one column for each variable. Each row lists the values of the variables: T=true, F=false. The following table shows the possibilities for a discount of 10%.
Table| Club card | today=Tuesday | Discount=10% | Result
|
|---|
| T | T | T | T
|
| T | T | F | F
|
| T | F | T | F
|
| T | F | F | T
|
| F | T | T | F
|
| F | T | F | T
|
| F | F | T | F
|
| F | F | F | T
|
(Close Table)
Each column that you add doubles the number of rows. So if we documented all the given RULES we would need 32 rows.
This is good for symbolic logic but clumsy and verbose used for
software requirements. The table does not help discover
inconsistencies and incompleteness in a set of conditions.
By using two dimensions, we can save space and compress all 8 cases into a "matrix" of possibilities. For each case we put the result(T/F) in
it's cell:
Table| - | - | Club card | no club card
|
|---|
| Tuesday | Discount=10% | T | T
|
| Tuesday | Discount!=10% | F | T
|
| not Tuesday | Discount=10% | F | F
|
| not Tuesday | Discount!=10% | T | T
|
(Close Table)
But this is still clumsy and unhelpful. The full example has 5 variables
and 32 cells. Karnaugh maps with more than
4 T/F variables are complex.
A famous computer scientist and software engineer, David Lorge Parnas, in
the 1990s, made the next step: extend the table to allow non-truth-values
in the cell. These tables are used to describe functions: rules that map
conditions into results. We make one cell for each combination of the
inputs and put the result in the cell. This makes it easy to check the
requirement for completeness (no blank cells), and consistency (one value per cell).
For example
- percent_discount::=following,
Table| Club Card | Age | Tuesday | Not Tuesday
|
|---|
| Club Card | Senior | 10% | 10% ??? 5% ???
|
| Club Card | not Senior | 10% | 5%
|
| no Club Card | not Senior | 0% | 0%
|
| no Club Card | Senior | 5% | 5%
|
(Close Table)
Notice that recording all the rules in this kind of table instantly
identifies the problem with RULES.
Key point -- a function table exposes an inconsistency.
It will also show up incompleteness. It lets us discover and display to the
stakeholders,
the problem with their requirements. It is a good tool for improving
them.
These date back to the 1960's and are a semi-formal notation. This means that they have a well-defined
mathematical base but they were used before the theory was invented.
Each decision table has two parts. The top half lists the conditions and
values like a truth table -- but many use
Y=Yes, and N=No instead of "T" and "F".
Conditions are also be shown as "-" which means
"don't care". The bottom half indicates the
actions. An action to be executed is marked with an "X".
An extra row counts how many cases are covered in each column.
Each don't care doubles the number of cases in that
column. These numbers are used to check
the table for completeness and consistency:
Table
| Condition | 1 | 2 | 3 | 4 | 5
|
|---|
| club card | T | T | T | F | F
|
| Tuesday | T | F | - | - | -
|
| Senior | - | - | T | T | F
|
| Action
|
| discount=10% | X | - | X | - | -
|
| discount=5% | - | X | - | X | -
|
| discount=0% | - | - | - | - | X
|
| Count | 2 | 2 | 2 | 2 | 2
|
(Close Table)
Each column describes a condition and the actions that are required. In general, several actions can be done
(several X's in a column) for a set of conditions. Indeed you use
"A", "B",... to indicate the sequence of actions to be executed.
Notice that each column covers 2 cases and so there are 10 cases covered in the table, even tho' there are only 8
possible combinations of T and F! This simple check should make us search for the inconsistency in the original
requirements. Again a table proves helpful in spotting incomplete or
inconsistent rules.
The main advantage is that users and managers have no problem understanding
them.
The main problem with these tables
was maintaining them in parallel to source code.
To fix this problem people developed decision table compilers.
These generate source code from decision tables (one written in COBOL by a BS
student of mine in the 1970's).
A piece of research (the Traffic Collision Avoidance System -- TCAS II
in 1995... at UC Irvine ) rediscovered the value of tables. They used "And/Or" tables to document
the conditions for states to change. Their clients liked them. The key idea is that
each column is "and-ed" together, the columns are "or-ed".
The researchers developed tools to check And/Or Tables for completeness and
consistency.
A graduate student at CSUSB has developed web-based tool to generate AND/OR tables from formulas.
I have been working on algorithms that can carry out Boolean operations on
And/Or tables. A BA student has developed
an initial C++ API framework for manipulating And/Or tables.
Notice that all types of tables that only use "T" and "F" can be extended to tables that use other values for variables. This reduces the size of the table but makes them harder to check.
. . . . . . . . . ( end of section Tables) <<Contents | End>>
One way of exploring complicated requirements is to program them in a throw-away
(Bread board) prototype. Review
[ a6.html#Prototypes ]
to see what I mean.
Notice that prototypes are OK for checking out how obvious requirements work.
They can not help with checking for obscure conditions that are inconsistent
or cases that are not defined. The tables and other techniques in this set of
notes can do this.
When you have a complex piece of logic to express it may pay to encode it as data.
This typically simplifies the algorithm. If the data is held on a data base then it
can slow down the program. But when the rules change (and most rules in real life change)
all you have to do is change the data. This can even be set up as a process done
by a stakeholder. This simplifies maintenance and reduces the total cost of
ownership.
Here is an example: the rules of Tic-Tac-Toe are simple to state in English. Consider just the rules that determines if x has won: There must be a row with 3 xs, a column
with 3 xs or a diagonal with 3 xs. It is easy to encode the board as an array
of 9 characters. Writing out the rule for winning in a table or in symbolic logic
takes time (exercises below). Diagrams, pseudocode, and coding are worse. One
technique is to list the the 8 winning configurations (3 rows, 3 cols, 2 diagonals)
in a constant array and just run through them in a simple loop. The program is much simpler
because part of the logic is now data.
Here is an example where putting part of the program in a data base makes sense. The fair on
our local bus system depends on the type of pass (one trip, one day, ...)
and the type of person (senior, disabled, student, other). More ... the values change
every year. It would be a mistake to code this in a program:
IF person is senior AND pass is 31-day
THEN cost is 23.50
...
Instead add entities to the data base and look up the value. Add a simple process that
lets the right person in the bus company change the values. You can now use the same data
to prepare flyers, fare sheets, and time tables....
In my experience complex logic is nearly always best analyzed using
one or more tables. This will tell you if any cases are missing, or if the
logic is inconsistent. After you have one this and resolved the problems
it is worth going to pseudocode and prototypes.