There is also a study of some well known codes. Brian Hayes calculates
how many unique codes can exist, and how many are already in use.
This was published in the
"American Scientist" magazine, Vol 93, Jan-Feb 2005. It covers the
following coding schemes:
I noted the subtle but effective security. I also noted that some cards were red or green. I asked about this and it turned out that the red cards indicated threats to social workers... So in this case one data item was the color of the media.
By the way... I recommended not computerizing this process.
Clever choices can improve system qualities: security, reliability, time to program, ...
The Roman numerals have complex syntax and semantics [ Mini-Project2.html ] (BNF) and are not good for doing arithmetic ( can you divide XXX by VII ?). Avoid them on input, use them judiciously on output, and do not store numbers in this form!
To be more formal we could define:
Having only two digits fits well with electrical and electronic circuits which tend to be either "on" or "off". In about 1945 Shannon defined
This has resulted in many computer people being able to recite the powers of two up to the highest address on their favorite machine.
Nibbles are written and spoken using the hexadecimal digits, 0(=0000),1=(0001),2,3,4,5,6,7,8,9,A,B,C,D,E,and F (=1111).
So, for example, "2A" in hex means "00101010" in binary.
Here a number like 987 was encoded by three decimal digits each represented as a nibble in binary:
This wastes some bits but is very convenient for important things like dollars and cents.
Floating point works well when we need a wide range of values and can put up with larger errors on the larger numbers.
Again in commerce and fiance we need precision and speed rather than range. So a Fixed Point notation was preferred. Here you use BCD and the machine scales the number by dividing by a fixed power of ten. This is available and common in COBOL. In SQL we have DECIMAL(p,q) (p digits, q after the decimal point).
But a canny programmer would use these expressions
Again -- Money is naturally expressed as fixed point decimal with two decimal places. So
If your language supports this -- use it. If not, store money as long integers meaning the number of cents. Then
In the 1960's the American standards people ( ANSI ) proposed what has become the standard 8 bit coding for characters -- ASCII
ASCII covers all the characters needed for American needs, but has become the de facto standard on the Internet, and whenever data needs to be shared. The International Standards Organization treats ASCII as a specialized code for use in America. In the UK, the American "#" becomes the symbol for the British pound. Each European country has its own special symbols.
IBM tried to create its own standard -- an Extended Binary Coded Decimal code named EBCDIC. This will disappear with the last mainframe.
Recently, a new standard -- Unicode -- has been created that covers just about every character in every alphabet in the world. This is a 16-bit code. ASCII and the ISO codes appear within it.
The Web uses HTML and HTML has introduced a number of special "entities" for showing non-ASCII characters like Σ and α. These are given numbers and encode in HTML like this:
For example the symbols "<" and ">" are encoded as "<" and ">". The double quote sysmbol is encoded as """.
This link [ mathchart.html ] shows how to encode Unicode mathematical symbols in HTML and [ arrows.html ] how to do arrows. I have a partial encoding for Greek letters and other ΤΕΧ sysmbols in [ ../samples/tex2html.html ] (ΤΕΧ is a mathematical type setting system developed by Donald Knuth).
There is a link to more on the HTML below.
Block sequences Blocks of numbers given to different parts of the organization to allocate (in sequence) to records of data in a file or input steam.
Alphabetic Abbreviations and Mnemonics A string of letters is chosen to identify an entity or a group of entities of a given type. A few letters stand in for a word or phrase Example: States(CA, AL,...), IANA and DNS countries(uk, tv, ru, ...), IATA (LAX, ONT, LHR, MSP, ...). Abbreviation for a department teaching a course -- CSE PHYS MATH ENG. Abbreviations for buildings on campus.
Arbitrary Systematic Codings Example: Library of congress subject classification, Dewey Decimal system for books. The Assoc for Computing CCS system for computer science. Linnaeus's technique for species.
Digit Groups Different groups of digits/characters in the data are themselves coded data. For example in a 9-digit ZIP-code the first digit determines a geographical area, the next three the town, and the 5th digit a post-office. The next 4 digits identify a delivery point. Example: Zip codes(92407-1133), phone numbers((909)-537-5257),..., SSN, URLs.
Derived Codes Mixes different coded data into one element Example: My UK Driving license number, CSUSB Library call numbers, Subscriber codes for magazines, Rooms on campus.
Ciphers and Encrypted data Example: Spoof at the Imperial Chemical Industries was a number added to the paint sales. We have a lot of good work done since then -- look up DES. PGP, etc. on Wikipedia if you need more detail. Numbers are disguised for security or mnemonic purposes Passwords should be encrypted as soon as they are entered and never stored without salting and hashing!
Actions Examples: A=Add, D=Delete, ..., The 50+ actions that the 'vi' editor has built in to it, Mnemonic codes in assembler. Codes that represent actions. Transaction codes -- for example with a banking application we might find deposits (coded D) and withdrawals (coded W).
Self-checking Elements Uses an added digit or character calculated from the rest. Example: 9s remainder and 11s remainder check digits are added to a decimal number. For a detailed analysis see [ http://www.skorks.com/2011/08/even-boring-form-data-can-be-interesting-for-a-developer/ ] (SKORKS, some Ruby included).
There are five ways of encoding compound data:
<name><first>Richard</first><initial>J</initial><family>Botting</family></name>is a piece of text with added "tags" that indicate the meaning of the parts. In a [ Record Structure ] (above) the "tags" are not needed because their sequence is known and the lengths are fixed (or at least predictable). Thus we get an encoding that is guaranteed not to be ambiguous, is easy to read (kind of), but is somewhat inefficient.
</end tags>to delimit data. Tags can also have attributes:
<certificate type="participation">Unix Training</certificate>.
XML also allows some tags to be unpaired and these are shown like this:
<endless tag attributes... />XML documents can be parsed fairly easily.
For each application that uses XML must have a DTD -- Document Type Definition published that defines the structure of the data -- what tags can appear inside others. Defining a DTD takes a significant amount of work. But once defined you can use tools to check validity, ...
. . . . . . . . . ( end of section Markup Languages) <<Contents | End>>
In computer science most of our knowledge about linguistic design has been put into designing programming languages. Programming languages are the most complicated schemes for encoding a domain in existence. There are hundreds of them. For more take a CSE Programming Language class like our CSE320 [ ../cs320/ ] (Advert).
. . . . . . . . . ( end of section Encoding Compound data) <<Contents | End>>
. . . . . . . . . ( end of section Special Encodings) <<Contents | End>>
. . . . . . . . . ( end of section XML) <<Contents | End>>
. . . . . . . . . ( end of section HTML5) <<Contents | End>>
. . . . . . . . . ( end of section Markup Languages) <<Contents | End>>
. . . . . . . . . ( end of section Reference and Online Resources) <<Contents | End>>
Notes -- Analysis [ a1.html ] [ a2.html ] [ a3.html ] [ a4.html ] [ a5.html ] -- Choices [ c1.html ] [ c2.html ] [ c3.html ] -- Data [ d1.html ] [ d2.html ] [ d3.html ] [ d4.html ] -- Rules [ r1.html ] [ r2.html ] [ r3.html ]
Projects [ project1.html ] [ project2.html ] [ project3.html ] [ project4.html ] [ project5.html ] [ projects.html ]
Field Trips [ F1.html ] [ F2.html ] [ F3.html ]
[ about.html ] [ index.html ] [ schedule.html ] [ syllabus.html ] [ readings.html ] [ review.html ] [ glossary.html ] [ contact.html ] [ grading/ ]