And here is the reply
Since some students need to sign or accept (they can do the acceptance through MyCoyote Student Grades) the contract before the grade rosters will be available, this feature will be added to the class roster, too.
Our campus requested the Incomplete form as a result of the CSU Student Records Audit that noted we had not received forms for all of the incomplete grades, they were not completed properly when received, and did not have a student signature.....This is considered a contract between the faculty member and the student, so it does require both signatures.
Moral: find out about what data exists, what is needed, and how it can be computed or input.
By the way, also note the iterative implementation strategy: roll out only some of the functionality at each iteration. Start with a good (but incomplete) system and add to it periodically. We will compare this to some alternatives (Big Bang for example) later.
This is a continuing story, see later in these notes on an unexpected problem with this nice new feature.
But when I could copy the rosters from a terminal screen into the central "SIS+" and paste the data into a spreadsheet the errors almost disappeared. Again note the pattern -- remove paperwork.
The latest system (CMS) gives a teacher the option to download his or her roster directly as a spreadsheet. This a very useful feature. It is the fastest and most reliable system I have used. However it is less secure because the spread sheet has to be downloaded in an unencrypted form... and pieces of unencrypted data may be left on my hard disk -- even after I have deleted the downloaded file.
Similarly, Course Management Systems on this campus (Blackboard, Moodle) also extract CMS data to populate the grading subsystem. Again, having easy access to the data makes the system an improvement over previous systems. On the other hand faculty who have uploaded materials (data) to Blackboard will not want to upload it into Moodle -- so Moodles success will depend on processes that download and re-upload data.
Computer based systems are almost entirely about handling data. In systems, the data that exists and can be created, processed, collected, and output drives the selection of good designs. We need a way to trace and define the data in our systems. We need a way to picture and visualize the data in new systems.
To analyze and design systems that handle data we need a specialized diagram for showing the flow of data. These are called Data Flow Diagrams. We also need specialized diagrams for showing the structure and meaning of data. These are called Entity-Relationship-Diagrams.
If you want to change a systems it is vital to understand the data in it. The technical feasibility of a new system will often depend on what data is already available. Samples of data (printouts, forms, manual files and records) are a good starting point. So are the descriptions of data in the documentation and source code of any software in the system. But you need to make a more abstract or essential model of two things: the (1) the dynamic flow of the data through the system, and (2) the static structure of the data in the system. To master the complexity of a real domain you need diagrams that just show the essentials: how the data moves, where it is stored, and how different data is related. These are best done by drawing DFDs (Data Flow Diagrams) and simple ERDs (Entity Relationship Diagrams). The details are often described in a Data Dictionary and we will cover these later.
Information Technology is all about delivering information to people. Information is data provided to the people who need it, in their preferred format, at the right time. Information needs to be computed reliably, cheaply, and securely. Tracing the flow of data from source to sink is a vital technique to achieve this aim.
So the analyst asked -- "What do you do with the 20 page report?". And a manager replied -- "I look for the row with the largest value in column 17." So my friend asked: "would you like the computer to do that for you?" They replied: "Can a computer do that?" He said "Yes -- and printing one line instead of 1,000 will save money!". They liked it.
So my friend asked: what do you do with the row of data? They told him "We multiply the 2nd column by the 4th column and subtract the 5th column". And he said: "The computer can do that too, if you want". They liked it.
So my friend -- now hot on the track of the end of the data flow -- asked "what do you do with the result you calculated" -- they said "if it is greater than 100 we send a memo to the manager listed in column 1." At last, my friend had found an action... "Could we just send the memo for you and let you know it was sent?".
They then went to lunch at a local pub...
Moral -- always ask where the output data goes to. And contrariwise -- ask where input data comes from.
Once you have a DFD is it useful for pin pointing the changes the enterprise needs to make. You can use DFDs to present the choices to management. They form an excellent start for specifying the hardware and software that will be needed. Meanwhile the static model -- the ERD -- is the starting point for designing a data base and then designing objects inside software.
In summary DFDs and ERDs are a useful intermediate step between problems/opportunities and solutions/plans.
A DFD is a circuit diagram of system. When done right -- following some very specific rules -- they becomes a rigorous picture of a information processing system. Sometimes we inherit DFDs as documentation of an existing legacy software system. This can be very helpful.
They are good for
Here is an example of a rough pencil and paper DFD:
Each DFD summarizes a collection of simple statements. The above diagram implies
some of the following facts:
In a logical DFD there is no mention of how something is done. No technology is mentioned. Several programs may be inside a single process. Avoid drawing DFDs that show the inner workings of a program -- they are better ways to picture internal architecture of software. One program may even implement several processes. Stores are not described in terms of their media (data base, mag tape, disk, RAM,...) but are named for the entities (outside the system) that they store information about (student, teacher, ...).
As a rule you should aim to move to logical DFDs as soon as possible. You can then solve the logical problems in the system without getting confused in the technology. This process produces a top-level design for a new system and is the start for specifying data and programs.
There are several different notations for DFD icons:
The SSADM DFD notation was developed by the British Civil Service (with LBMS Ltd.) from the Gane and Sarson notation. It is used in England and what used to be the British Commonwealth. As far as I can judge the Gane and Sarson form is most often used notation in the USA. The Gane and Sarson notation also allows a process box to have three compartments. These are used for: (top) a unique process ID. (middle) description of the function of the process. (bottom) the location where the process is carried out of the actor responsible for the process.
I will use Gane and Sarson and encourage you to do so as well in this course. But different enterprises will use different notations.
Below I have some notes [ UML notations for DFDs ] that show how the UML is used and explains why you should, for now, use one of the other notations rather than the UML.
Some processes are subsystems. This helps keep the diagrams of complex diagrams simple. They are shown as a whole process in some DFDs. Each is also defined by a DFD. This is called the refinement of the process. Such processes can contain hidden data stores and sub-processes. There is a potential tree of refinements.
Ultimately the data flows between processes and data stores are (nowadays) programmed using the Structured Query Language --(SQL).
SELECT StudentName FROM Student WHERE Student.id = "123-45-6789"However it is a mistake to go in to this level of detail in a DFD. A single data flow attached to a data store can be implemented by any number of SQL-type statements.
On the other hand you should aim to have each data store labeled with the name of a single type of real world object. The data store holds records about all entities of some type or other. The name of the data store should reflect the type of entity. Ultimately they become tables in a database or file.
Traditionally, creating data in a data store -- adding new records -- is shown by an arrow that flows from a process to a data store. Reading data is indicate by an arrow from the store to the process that needs it. Updates and deletions are shown as two way arrows since data has to be read and then rewritten.
Notice that a data store is needed whenever data is reordered or reorganized. On the other hand if the store is a queue or buffer, so that the first item of data to arrive is the first to be output then we don't show a data store: arrows are understood to be buffered by a queue.
Another simplification: you can put the same data store in several places. Traditionally you mark stores like this with an extra stripe at the left hand end. It also helps if you give each store a unique Id.
Notice that only a process can move data. So each data flow must either come from or go to a process. We do not permit data flows to connect entities or stores unless a process is involved.
Connections between processes and entities define the interfaces between the system and its environment. It is rarely unambiguous what data is communicated. Thus these data flows must be described -- at least given a name.
Similarly, it is not clear when you connect one process to another process with an unlabeled arrow what is going on. The arrow needs to be named with the data being transmitted. The name will need further definition (later) in a Data Dictionary. Occasionally you will meet a doubled headed arrow -- here someone has to define the protocol that describes the conversation between the two connected processes.
Notice that in real systems (unlike computer programs) data flows between processes are buffered. One process writes the data and the data waits in a queue until the other process reads it. The writer doesn't have to wait for the data to be taken away. For example when you send me Email it is automatically stored before I read it. Similarly "Snail Mail" is put in my box. Memos, rosters, etc. are all buffered for me. So when Modeling a real system you don't have to say that data in a data flow is in a queue. This buffering is implicit in the the Data Flow model.
A data flow out of a store can only go to a process. It indicates that the process reads the data in the store but does not change it. External entities and stores are not allowed to read data directly -- they must get the data indirectly via a process. However, you don't have to label and document these data flows if the process can read the whole store. You only have to document the data flow from a data store if the process accesses only a part of the store.
A data flow into a store must again come from a process. It indicates any combination of the three basic operations: Create, Update, or Delete. Again if the arrow is unlabeled then it is assumed that the process can (or will) change any item in the store.
A double-headed arrow between a data store and a process indicates that the process may: create, read, delete and update the data in the store. Some omit the arrow heads in this case.
. . . . . . . . . ( end of section Semantics of DFDs) <<Contents | End>>
Do DFDs quickly -- pencil and paper, chalk-board. Only tidy them up when some else needs to see them. Use a tool only to impress people. However, even when sketching roughly follow the rules and avoid the errors listed on this page.
Some people put unique short identifiers on each part of a DFD. Avoid this if you can! But in those cases where the boxes are numbered, here are the rules: processes are numbered 1,1.1, 1.2, ... and data stores have an id that starts with "D" plus a number. External entities can be given single lower case letters to be their unique id. These ids are good for linking the same part in different diagrams. For example, the parts numbered 1.1, 1.2, 1.3, etc. are all parts of the process numbered 1. Similarly, 1.2.1, 1.2.3, etc. are subparts of process 1.2.
Never use more than one piece of paper for a DFD. The trick is to have layers of detail. We do this by expanding, exploding, or refining a process into a lower level diagram. This is done by taking a process and drawing a DFD that would replace it in the original DFD. There are three levels of detail commonly needed: context, level-0, and level-1. Here is a picture of how refinement works:
The table shows the three types of DFD and is followed by definitions and examples.
|Process Context||Shows one process with its inputs and outputs only.|
|System Context||One process + surrounding external entities|
|Level-0||Make the central process BIG and draw stores, processes, and flows inside|
|Level-1||Take a process on the level-0 and repeat the expansion in another DFD|
|Level-n+1||Take a level n process and refine it.|
|Processes||Activity diagrams, Use Cases, and Scenarios. Prototypes.|
|Data flows||Data dictionary entries and coding techniques.|
|Stores||Entity Relationship Diagrams, Tables, and Normalization|
These tend to be a little chaotic and unstructured. You may be forced to do this when interviewing people and starting design. But as soon as possible shift to top-down/refinement.
Notice we can schedule the above DFD in many ways. We can run the analysis process until it produces an idea, then pass it to the design process, which can modify the plan that triggers implementation activity. It all depends on the size of the change to the model and the plan whether we get a traditional or an agile life cycle.
Exception: there may be some law that requires you to keep some data for a number of years. Find out if this is true.
The ideal solution is to have multiple copies of the process running in parallel. Input is distributed to the least loaded or first available server that can run the process. The next solution is to find ways of speeding up the process: better technology, simpler logic, ... Simple examples of this strategy are upgrading the CPU or adding RAM. But a subtler variation is reorganizing the data storage to give faster access to the data. This trick includes defragmenting disk drives. A third solution is to provide multiple parallel clones of the process running on multiple processors.
As an example high traffic web sites may have a dozen web servers and a special load balancing "switching" server front end.
Note -- multiple computers all running the same process are still a single process in the DFD!
Computer Scientists has discovered a large family of problems that can not be solved by a computer. These can not be programmed. An example is checking to see if a program will stop or not. We have also discovered problems that apparently demand very inefficient processes to solve them. A classic example is the "Traveling Salesman problem". It is worth studying computer science theory to be able to spot these.
There are also processes that are better done by a human than a machine. Ethical questions should not be handled by machines! Questions needing discretion should involve humans. Sometimes you need to design systems that support communication and cooperation so that complex (political) problems can be resolved by humans.
A version of this is sending data to a human to re-input later. This introduces errors... there must be a better way to handle the problem. Here is the smelly system and a possible improvement.
Examples: EMail -- automatically deleting messages that we don't need to see. Inventory -- automatically reorder when stocks get below a certain level. Record people's browsing, let them replay and/or edit the recordings.
On the other hand there is something wrong about forcing people to work as computers. You need a ballance.
For example, it is highly rational to insist on being able to undo things that can be undone. For example when the CSUSB system automated the handling of Incomplete Contracts (2010) it became incredibly easy to create a contract -- no forms to fill in. No signatures to gather. Unfortunately it became impossible to remove an incomplete contract that was on file -- in the old days you ripped it up and put it in the trash can. At this time (2010) you can not do that. So small mistakes can not be corrected. There is no "Undo" feature.
. . . . . . . . . ( end of section Smells and Patterns) <<Contents | End>>
At this time (Fall 2009) it is still better to use a traditional notation like the Gane and Sarson in these notes.
. . . . . . . . . ( end of section DFDs -- Data Flow Diagrams) <<Contents | End>>
Data is always organized in clumps called records. A record has a collection of items of mostly different data types in it. For example the CMS probably has a record that contains all the information about a student in it. Each type of record tends to reflect a real world Entity. Each type of record is given a meaningful name and this is put in the top compartment of a UML class. These entity names should be in your DFD as well.
The Palm Pilots and iPods I've been using for 6 or 7 years have simpler model with each contact having optional data about companies and titles. But don't get me talking about the different models of events and tasks on iPods and Palm Pilots.
Here is an example based on a project set in a restaurant.
The boxes are logical groups of data each referring to a real entity. The lines connecting the boxes are significant relationships`, for example: a Table has a single Waiter assigned to it, but a Waiter can be handling several Tables. Notice that this model does not show any attributes (the properties of the entities). It does not show the waiter's name for example. This kind of reduced model -- based on ideas about the real world is sometimes called a Domain Model. They are very useful for planning data bases (later) and for designing object oriented code (CSCI375).
Each item of data is given a name and a type:
name : typeExamples
address : string
initial : char
age : intNotice I used C++ data types... because my audience (you) has taken CS202 and can be expected to understand them. In general, you should use the words of your audience. With multiple audiences put different meanings in a data dictionary as aliases.
When you first draw these diagrams you can just list the attribute names and jot down more information in a prototype data dictionary. For example here is an UML diagram of the data I found in a class roster.
If an item is repeated use square brackets:
salary_each_month  : money
children : Person[*]
spouse : Person [0..1]
When you meet attributes that are actually other entities/records you should connect the boxes with an association.
If you know of a significant relationship between records/entities then show it as a line (an association) between the boxes. In fact, in some analysis and design methods, you check every pair (and grouping) of entities looking for important relationships between them.
Mark these relations with multiplicities:
Here is an ERD showing the relationships between Questions, Answers, and Comments in the DFD of my Tutoring System (above):
My old student edition of Rational Rose did UML ERDs well. Dia and Visio can also handle them. But the quickest way (after a field trip, say) is on a board or a piece of paper. Keep the edges of the boxes incomplete until done. Notice.... that you can just note the relationships without any need for attributes. Here is an example that I drew on my Palm Pilot one day.
Sometimes I even omit the boxes:
Look for lags, errors, missing data, and misfitting structures when ever you are analyzing a system.
. . . . . . . . . ( end of section Normalizing a UML Data Base) <<Contents | End>>
. . . . . . . . . ( end of section UML Data Models) <<Contents | End>>
. . . . . . . . . ( end of section Review Questions) <<Contents | End>>
. . . . . . . . . ( end of section Online Exercises on DFDs) <<Contents | End>>
. . . . . . . . . ( end of section Typical Exam Questions and Exercises on DFDs) <<Contents | End>>
. . . . . . . . . ( end of section Modeling the Data in a System) <<Contents | End>>
Notes -- Analysis [ a1.html ] [ a2.html ] [ a3.html ] [ a4.html ] [ a5.html ] -- Choices [ c1.html ] [ c2.html ] [ c3.html ] -- Data [ d1.html ] [ d2.html ] [ d3.html ] [ d4.html ] -- Rules [ r1.html ] [ r2.html ] [ r3.html ]
Projects [ project1.html ] [ project2.html ] [ project3.html ] [ project4.html ] [ project5.html ] [ projects.html ]
Field Trips [ F1.html ] [ F2.html ] [ F3.html ]
[ about.html ] [ index.html ] [ schedule.html ] [ syllabus.html ] [ readings.html ] [ review.html ] [ glossary.html ] [ contact.html ] [ grading/ ]