.Open Introduction to Types . Personal Note When I started this page I did not realize how much information it would have to carry to make the MATHS language useful. . Introduction In any project there will be a number of more less defined types. Obviously in computer projects we have have "data types", "Entitiy types", "ADTs", classes of objects, or other technologically bound models of reality. The MATHS type system has to handle these. But we will also (in real projects) have to think about the real world -- cats, dogs, candlesticks, and sealing wa, cabbages and kings. These are the objects that the software describes. And in theoretical work we will often be working in terms a set of elements with given operations and properties -- algebras and abstract data types. Some of these are standard well known types and are described below. Then there are the types that are peculiar to the domain being studied. This set of types is called the basis of the domain. Finally there are the types that can be constructed from the basic and standard types to give structured types (sets, tuples, relations, ...). This collection of basic, standard and constructed types is called the universe of discourse of the project. The key discovery by Whitehead and Russell in their work on "The Principles of Mathematics" was that the idea of type could be used to avoid paradoxes as long as it was a syntactic concept. This means that each type defines those expressions that are valid or meaningful as well as what meaning can be attached. The Type system outlaws from the logic any discussions of expressions that involve contradictions in terms and paradoxes like the Russell Set. Thus any theory of types is a metalinguistic activity and taks place in yet another universe of discourse. Simply put: Each type is linked to a set of expressions and a set of meanings and connects them together. And each type does this in its own way. Types are the main way that MATHS is extended. All types must define a non-empty set of objects -- a valid universe of discourse -- so that we can use the predicate calculus to reason about objects of that type. . Standard Types The following are common descriptions of types (Alphabets): Generic finite set. Probably characters or symbols. Operations=>{(!), (1st), (rest),...}, concatenation, disassembly of strings. (@): The propositions, Values=>{true,false}, Operations=>{and,or,not,...} (Char): A Finite set of 256 characters used in MATHS. The ASCII characters, See .See http://www/dick/samples/comp.text.ASCII.html#Main%20Content Values: "a", "b", char(26),... (Unicode): Used to describe languages like Java, JSON, Javascript, etc that use Unicode .See ../samples/glossary.html#UNICODE Values: To Be Done .Hole Unicode (Documents): Pieces of Documentation, Operations=>{$,...and,or,...}, see below. (Events): A finite set of elements written in a different language. (Numbers): Type that includes real,integer, etc types, Operations=>{+,-,*,...} (Dimensions): Length, Mass, Time, Information, Area, Angle,... Dimensions, each is a separtate type. Operations=>{+,-}, For all D:Dimension, some units:@(Number->D). (eg 60.sec=1.hr,8.bit=1.byte) (Sets): Generic, For S:Sets is short for For T:Types, S:@T, Operations=>{&, |, ~, ...} (Times): A special ordered set with elements written in a different language (Types): Generic . Basic Types In real systems we work with universes of discouses like: Employees, Products, Animals, Telephones,... Michael Jackson notes that we need to be able to designate individuals in these domains. Terry Halpin has noted that the individuals/elements/objects/"values" can be described by a definite description like this .As_is the Employee with SSN 123-45-6789 .As_is the[e:Employee](e.SSN="123-45-6789") It seems wise to introduce -- as a partly-baked idea -- a standard way to designate objects of a type: .As_is Employee("123-45-6789") Here is a possible formalization: BASIC_TYPE::=following, .Net T::Types. identifier::Set=given. designation:: identifier >-> T. |- designation in identifier --- T, a one to one mapping. For i:indentifier, T(i) ::T= designation(i). .Close.Net So we might define, as an example, example::=Net{ SSN::Sets. ... Employee::Types. |-$BASIC_TYPE(Employee, identifier=>SSN). }=::example. By further defining functions, predictaes, sets, relations, etc. we can easily model the types of objects in the real world. This conceptual model is an excellent guide to designing data bases, a natural ontology for the domain, and a source of ideas for classes in object-oriented programming. . Type descriptions A Type is defined to have two key attributes attributes: |-TYPE::=Net{ description::#lexeme=symbolic representation of the type. objects::Sets=set of values or things of this type. ... }=::TYPE. Type_descriptions::=Net{ Notice that a type T is not a member of a type. . Syntax type_description::= $type_power |$type_product, type_power::=$type_product "^"$type_product, Note -- this outlaws type descriptions like T1^T2^T3 to avoid an ambigutiy, since we can show that (T1^T2)^T3 is different to T1^(T2^T3). An example {0,1}^{0,1} has the set of maps of bits into bits including elements like the identity and negation operation (not). type_product::= $type_factor |$type_factor "><" $type_product, Note -- this defines the associativity of products of types: T1>< Number for example all have a sigle character and a single number in them. type_factor::= "@"$type_element |$type_element, The type @T has the set of all subsets of T as elements. type_element::=name_of_a_type | "("$type_description ")" | $type_discriminated_union |$enumerated_type. enumerated_type::= "{" list_of_values "}". If D is a list of declarations then \/{D} is a discriminated union or co-product. There is also a longer form that uses the "Net" syntax. type_discriminated_union::=$co_product |$long_form_of_discriminated_union. co_product::="\/" "{" $list_of_declarations "}". list_of_declarations::=$declaration #( "," $declaration). declaration::= identifier ":"$type_description. A discriminated union like \/{ character: Byte, number:Nat } has elements like character("a"), number(1), character("1"), etc. Discriminated unions are useful for decribing data bases and partial tples. A possible semantics is found in the theory of co-products in Category Theory($CATEGORY). CATEGORY::=http://www/dick/maths/math_25_Categories.html. Briefly a product defines a composite structure with maps from the composite to the components. A co-product has the reverse maps -- from each component into a unique element of the co-product. . Partly Baked Ideas Two ideas .Net The long form leads to better rendering. The "\/" symbol is followed by a net of declarations. long_form_of_discriminated_union::= "\/" net$TBA. The meaning, is to extract the declarations, ignore the constraints, and create a list D of declations for the short form. .Close.Net .Net This is needed to define rational numbers (for example) as a type. type_quotient::= $type_element "/" (equivalence_relation \ function). A type_quotient T/E where E:equivalence(T) is defined as any chosen isomorphism of the set of blocks or equivalent elements of T. Simarlarly T/f where f is a map from T into T2 (f:T2^T), then the lelements of T/f are the inverse images in T of elements in T2 -- for each t2:T2, { t:T. f(t)=t2}. See .See Quotient types below. .Close.Net . Semantics As a rule the description is a string that describes the objects in the following way: |- (desc-objects): For all T1,T2:Types, if T1.description= T2.description then T1.objects=T2.objects. |- (hat_means_power): For all T1, T2, T3:Types, if T1.description=T2.description "^" T3.description then T1.objects=(T2.objects)^(T3.objects). |- (at_means_subsets): For all T1, T2:Types, if T1.description="@" T2.description then T1.objects=@(T2.objects). |- (cross_means_product):For all T1, T2, T3:Types, if T1.description= T2.description "><" T3.description and T1.description in type_factor then T1.objects=(T2.objects) >< (T3.objects). |- (parentheses_mean_parentheses): For all T1,T2:Types, if T1.description="(" T2.description ")" then T1.objects=T2.objects. Need to give formal semantics for coproducts, and quotients.... and tidy up. .Hole types }=::Type_descriptions. .Open Standard subsets/classes . Finite_Sets This description Finite_sets=sets which are not isomorphic to a proper subset of themselves, is not type-safe. The only type safe expression is to talk about the finite sets of elements of a given type, like the following: For Type T, Finite_sets(T)::@T=subsets that are finite. For T, S:@T, Finite_sets(T)::@T={F:@T. F in Finite_sets(T)}. For T, S:@T, finite(S)::@=S in Finite_sets(T)}. The idiom Let A by a finite set, ... is common in computer science and should be interpreted as short hand for T: Type, A: @T, |- finite(A). Where T is some unused variable. These can be defined by enumation: Syntax: {a,b,c,...}, or by construction since the union of two finite sets is finite and the intersection of two finite sets is finite. Some theories of finiteness can be found at .See ./logic_6_Numbers..Strings.html discussing numbers. . Integers Int, Z Integers,Whole Numbers, Values=>{...-3,-2,-1,0,1,2,3...}, . Numbers Rational, Q Quotients/Ratios Fractions$(numerator:Nat0,denominator:Nat,...} Real, R The Real Numbers, n..m::={i:Z || n<=i<=m}. Bit::=0..1. n..::={i:Z || n<=i}. ..m::={i:Z || i<=m}. Natural::={n:Z || n>=0}. Nat0:=Natural. Positive::={n:Nat0 || n>0}. Nat::=Positive. ()|- Nat0 = Nat | {0} = 0.. . Fixed(n) ::==-2^(n-1)..2^(n-1)-1. Float(p,s) A finite subset of the Rational numbers (denominators=2^-i where i:Fixed(s),...) Decimal(p,n):@Rational::=(-10^n+1..10^n-1)/(10^p), Money::=Int/100. Hence |- n..m in @Z. |- T^n ::= T^(1..n). |- for n:Nat0, T^n ==> #T. So if x is in T^n then x has type #T. .Open Standard subsets/classes . Expressions and Overloading As a generallity the type of an expression can always be determined (sometimes relative to certain assumed types), from the type of the parts and the operators that make it up. Even so a modicum of overloading is allowed - an symbols can be defined to carry out different operations, (and return different types, depending on the context. Such a definition is written: term::context=expression For example, here are three different views of an assignment: assignment::syntactic= variable ":=" expression, assignment::parsed= $Net{v:variable, e:expression,...}, assignment::semantic= map [v:variable, e:expression]( s'.v=e(s) and for all u:variable~{v}(s'.u=s.u)). Standard examples of overloading: (casts): For T:Types, a:T, a::@T={a}, a::#T=(a), a::%T=(1+>a). (set extensions): For T1,T2:Types, f:T1->T2, f::@T1->@T2={f(x) || x:T1(x in (_) }, For T:Types, op:infix(T), op::infix(@T)={x+y||x in (1st), y in (2nd)}. As a rule: An overloading can not take an existing valid meaning in the given context. So (|) is given a meaning for all sets and so can not be extended to operate on sets as well. Polymorphic and generic terms have definitions written like this: For T:Types, term::t(T)=expression(t(T)) where t(T) is some formula that contains T as a free variable. If the symbol T appears in the expression then it is interpreted as the universal set of a generic type T. In general T can be any list of symbolic variables. Completely generic terms ( "macros") can be defined but are rare. If v1,v2,v3,... are previously undeclared variables then .As_is For v1,v2,v3,..., M(v1,v2,v3,...) ::= e, creates a function that applies to all types of operands and is short for .As_is For T1, T2, T3,...:Types, v1:T1,v2:T2,v3:T3,..., M(v1,v2,v3,...) ::= e. .Close Introduction to Types .Open Theory of Types Theory_of_types::=Net{ SEMANTICS ::= http://www/dick/maths/logic_7_Semantics.html Types::@TYPE. We have a set of types, each one structured as a TYPE. TYPE::=Net{ description::#lexeme=symbolic representation of the type. objects::Sets=set of values or things of this type. Note. If T is this type then x in T.objects iff x has type T. Symbols for types have two meanings, one of these is in a normal part of an expression where it represents the set of objects or an identity map from objects to objects depending on the context. (dual_meaning): 2 symbols o /representation(description). (description_is_symbol)): symbol(description, @objects, objects). The other meaning is when the description identifies the type in a declaration. elements::@meaning_in(objects), elementary_expression::=elements.syntax. |-(elem): elements in element.syntax->objects. expressions::@#lexeme. The expressions of this type are defined in terms of objects in other types and so is not a property of this type alone. [See EXPRESSION below] (elem_in_expr): elements.syntax==>expressions. equality:: equivalence_relation(objects). [STANDARD.relations] infix_pairs::@meaning_in(infix(objects)). [STANDARD.infix] infixes::=Lexemes referring to binary operations. infixes::=infix_pairs.syntax EXPRESSION uses infixes(T) to define expressions(T). (precedence): STRICT_ORDER(infixes, has_precedence_over)Chapter 4, section 8] Used in EXPRESSION. propositional_functions::@references(@^objects). Used in Proposition Calculus. parallels::@meaning_in(@(objects, objects)). parallel::=parallels.syntax. (equals_is_parallel): ("=",equality) in parallels. serial::@infixes={s.syn || for some s:infix_pairs( associative(s.sem)) }~parallel. other_operators::=@symbols~infix~parallel,operators giving results of this type. (operators_mean_maps): For all o:other_operators, some domain:Types, o.type=domain->objects. operators::={(representation=>i.syntax, type=>infix(objects), meaning=>i.semantics) || for some i:infix_pairs} | other_operators. See Also Expressions. .See Expressions }=::TYPE. (syntax_symbolizes_semantics): For all T:Types, e:T.elements, symbol(e.syntax, T, e.semantics). (descriptions_symbolise_types): For all T:Types, symbol(T.description, Types, T). |-(type_symbol_means_universal): For all T:Types, symbol(T.description, @T.objects, T.objects). (different_types_do_not_overlap): For all T1, T2:Types, T1=T2 iff some (T1.objects&T2.objects). (parallels): For T:Types, P:T.parallels, PARALLEL(expressions(@), T.expressions, P.syntax, P.semantics). (serials): For T:Types, s:T.serial, SERIAL(T.expressions, T.expressions, s, T.infix_pairs(s)). ??{take precedence into account}? }=::Theory_of_types. . Examples of theoretical types (Example_PC)::Types=following, .( description=>"@", elements=>{("true", true), ("false", false)}, equality=>( (true, true) +>true | else+>false), infix_pairs=>{("and", and), ("or", or), ...}, serial=>{"and", "or"}, parallel_pairs=>{("iff", iff), ("=", equality), ...}, other_operators=> .( { ("=", T>@, T.equality) || for some T:Types} |{("in", T><@T->@, in) || for some T:Types} |..., .) ... .) or even For T:Types, Example_Sets(T) ::Types=following .( description=>"@"! T.description, elements=>{("{}",{}), (T.description, T.objects)} |{ "{"!e!"}" || e:T.elementary_expressions}, infix_pairs=>{("&", &), ("|", |),...}, serial=>{"&", "|"}, parallel_pairs=>{("=", =), ("==>", ==>), ...}, ... .). .Close Theory of Types .Open Meta-theory of types . Babel-Aleph-Null The definition of the type '@' in terms of type theory is an interesting test case of the system, and is rather like attempting to build the foundation of a building on its roof. The inabillity to this suggests limits to the system. The successful construction of a model of logic, on a theory of types, expressed in terms of that logic, establishes the power but not the consistency of the system. @::Types=following, .List symbol=>"@", elementary_expression=>"true"+>true|"false"+>false, infix_pairs=>"and"+>((true,true)+>true |+> false) |"or"+>((false,false)+>false |+> true), parallels=>"="+>({(true,true),(false,false)}+>true |+> false) |"iff"+>({(true,true),(false,false)}+>true |+> false), serial=>{"and","or"}, other_operators=> | following, .Set { ("not",@->@,true+>false|false+>true)} {("if_then_",distfix(@,@,@),(true,false)+>false |+> true)} {("in", @T, in) ||T:Types} {("==>", @T><@T->@,==>)||for T:Types} {("=", @T><@T->@,=)||for T:Types~"@"} {("all", @^T->@, map F:@^T(img(F)=true)) ||for T:Types } .Close.Set .Close.List @.operators ::=following, .Table representation Type meaning .Row not @->@ true+>false|false+>true, .Row and @><@->@ (true,true)+>true |+> false, .Row or @><@->@ (false,false)+>false |+> true, .Row iff @><@->@ {(true,true),(false,false)}+>true |+> false, .Row if_then_ @><@->@ (true,false)+>false |+> true, .Row in T><@T->@ map x,X(x.X), .Row = T>@ map x,y((x,y).(equality(T))), .Row all @^T->@ map F:@^T(img(F)=true), .Row ==> (@^T)><(@^T)->@ map [A,B] (all(map x(if x.A then x.B))). .Row ... .Close.Table quantified_wff::={( ("for" Q B "(" P ")"), (Q "(" "map" "[" B "]" "(" P ")" ")" )) || Q:{all,no,...}, B: (variable (colon | equals) set_expression), P:@.representation} . Consistency and Non-empty types Types must always define a non-empty set of objects. And a simple way to prove this is give a finite example. Consistency is demonstrated by exhibitting a finite model. Another techniqu is to use the simplest (non-recursive) constructions. . Avoiding paradoxes Notice that sets are defined by a formula {x:T||B}, which explicitly assigns a type to the variable. But the collection of all types is not a type so we can not define a set of all sets and so should avoid the paradoxes discovered at the beginning of the 20th century. Notations like X:Type or Type X can only be used to set of a generic formula. We can prove that the set of data types leads to a consistent theory by constructing a model in terms of naive set theory and n-tples. The data types are those generated from @ and Nat by operations of fnite Products and coproducts. The proof of relative consistency is done by using Codd normalizations. Model each type as a set of n-tpls of elementary object. Replace T2^T1 by (T1>Nat0= map[X](|X|). So, this is true Card{} = 0, Card{"x"} = 1, Card{"x","y"} = 2 The following expression /Card(1) has an ambiguous type -- the set of singletons of unknown type. Instead make the type explicit Strings@1 = sets of strings with one element. .Close Meta-theory of types .Open Deriving New types Wechler presents an extensive theory of univeral algebras [Wechler 93, Section 3.1]. . Products .See Notation$Net{a:T1, b:T2 ...} . Quotient types If there is a map from one type to another then there is an easy way to generate a new type - by grouping the elements of the second type according to their values under the mapping - this is called the quotient type with respect to the mapping. Given f:T1^T2 we can always derive a new type (symbolized T2/f) isomorphic to a sub-set of @T2 where X:T2/f iff for some y:T1, all x:X(f(x)=y). This called the quotient type of T2 with respect to f. Given a R:@(T1>Z, 2nd=>Z)/equiv where ( (a,b) equiv (c,d) iff d a = b c)). .See Notation. .Close Deriving New types .Open Typed Expressions . Expressions To complete this section here are the definitions of the syntax of expressions. A string of lexemes that is not terminated by a comma or period occurring outside parenthesis of some kind or other, can be expression. But an expression can not contain a comma or period outside of parenthesis. You can therefore use punctuation to recognised balanced expressions. A second cut needs a set of operators - those symbols that only make sense when accompanying other symbols - and defines whether an expression is functional. The final check makes sure that the types of actual parameters match the operators on them. These will be presented in reverse order. EXPRESSIONS::=Net{ Uses SEMANTICS. Uses Type_Theory. Uses ASCII_code. expression::= balanced & functional & ( |[T:Types] typed_expression(T) ). We start be defining the set of fully parenthesized expressions(p_expressions) of a given type and then state circumstances under which parentheses can be removed without change of meaning. p_expression::abbreviation=fully_parenthesized_expression. fully_parenthesized_expression::Types->@#lexeme. For T1: Types, p_expression(T1) ::= following, .( elementary_expression(T1) | variable(T1) | UNARY(representation(T1), other_language_string) | |{UNARY(p_expression(T2), p_expression(T1^T2)) || T2:Types} | |{BINARY(p_expression(T2), p_expression(T3), p_expression(T1^(T2><(t))} | |{UNARY(p_expression(@T2), p_expression(@(T2>typed_expression(T). The semantics of MATHS expressions are trivial when formalized in MATHS. The correct meaning of a MATHS expression looks (in MATHS) like itself . Spivey has remarked on a similar phenomena in the attempt to define Z in terms of set theory and set theory using Z. The denotation (map from syntax into semantics) is the identity mapping. Such a circular definition can be shown to be useless for ignorant readers because ignorance is a trivial fixed point[Spivey 88]. Luckily, complete definition is not essential for a human being to learn a language. Future research can establish the semantics in terms of other languages - such as Z, Larch, .... The syntax and a prossibly feasible way to parse expressions is a notational problem. .See Notation.12 Expressions }=::EXPRESSIONS .Close Typed Expressions .Open Types as a Data Model .Open Formal Model . Entities and Maps DATABASE::=Net{ Collection of named types, objects, and maps. Uses lexemes, type_theory, EXPRESSION. T::Finite(Types)=Entity types. |- For all t:Types, t.\mu=t.denotation in expressions(t)->objects(t). D::=Entity descriptions. Object classes::=Finite(|[t:T]expressions(@t)). For A:D, E(A) ::=Named objects in A:Finite(expressions(A)). For A,B:D, F(A,B) ::=Named maps from A to B:Finite(expressions(B^A)). For no A, Id(A) in F(A,A). . Paths Through the DataBase. For A,B:D, S(A,B) ::@(A,B)=\mu(F(A,B)) | {R:@(A,B)|| /R in \mu(F(B,A))}. For A,B, P(A,B) ::@(A,B)=paths from A to B, For A,B, P(A,B) ::@(A,B)=S(A,B) | { s;p ||for some C:D, s:S(A,T3), p:P(C,B)}, L(A,B) ::@(expressions(B^A))=/\mu( { s1;s2;p || for some C1,C2:D, s1:S(A,C1), s2:S(C1, C2), p:P(C2,B)}~Id(A) ). ... connected::@=for all A,B, if A<>B then some(P(A,B)). For A,B, redundant(A,B) ::@=some(F(A,B) & L(A,B)). redundant::@=for some A,B(redundant(A,B)). ). For A,B, M(A,B) ::basic maps=F(A,B)~L(A,B) . ... conceptual_model::=Bachman Diagram, conceptual_model::=digraph( vertices=>D , edges=> {(v1,v2) || for some A,B:D, m:M(A,B)(v1=names(A) and v2=names(B) ) ). ... conceptual_graph::=digraph( vertices=>D | |[A:D]E(A) | |[A,B:D]F(A,B), edges=> { (e, A) || for some A:D, e:E(A) } | {(m,B) || for some A,B:D, m:M(A,B) } | {(A, m) || for some A,B:D, m:M(A,B) } } ). ... paths, prime_links, links::(names(D))^( names(D)>names(D), Arrows=>{arrow(p)||for some T1 ,T2:D (T1=T2 and p=Id(T1) or p in paths(T1,T2 )}. . Conceptual Model conceptual_model::=digraph(vertices=> names(D)| |[T:D]names(E(T))| |[T1,T2:D]names(T1,T2), edges=>{(v1,v2) || objects(v1)in classes(v2) or maps(v1) in B(dom(v1),cod(v1) and cod(maps(v1))=classes(v2) or maps(v2) in B(dom(v2), cod(v2)) and dom(maps(v2))=classes(v1) } ). rationalized::@=no cycles( conceptual_model). if rationalized then conceptual_model in DAG. }=::DATABASE. . Relationships with a key Suppose we have a network N:=Net{v1:V1, v2:V2,...}, and k and d are two lists of variables in v1,v2,v3,... k,d:#|(v). where k do not overlap: no |k & |d. Define T:=${k1:k1($N),...d1:d1($N),...}, then (@N k -> d) ::=sets in @T with k determining d. The above is sometimes called a functional dependency. These are worth noticing since they indicate the logical structure that underlies the original design. The process starts with a set S:@N and finding a k and d such that S in (@N k->d). If k is large as possible and d is all the other variables then k is a suitable key or identifier for a database. If niether S.k nor S.d themselves contain any functional dependencies then the set S is in third normal form(TNF, 3NF,...). k nor . Example -- university U::=Net{ sid, name, address, sched_number, course, time, course_number, course_name, dept, units,...::string. }=::U. The following would some of the dependencies that we can use to argise a model of this university: courses::@U(course)->(course_number, course_name, dept, units). students::@U(sid)->(name,address). enrolment::@U(sched_number,sid)->(). class::@U(sched_number)->(course, time). .Close . Equivalence of Models ... . Rationalisation It is possible to express any finite collection of structured sets as a network database by applying rules like those in the LDST step in SSADM. Here are some examples, For all A:D, C:#D, if A= ><(C) then for all i:pre(C), some I:/\mu(i.th)( I in F(A, C[i])). For all A:D, s:Documentation, (v,B):declarations(s), if A=$s then v in F(T,B). For all A,B:D, if A==>B then "Id" in F(A,B). For all A,B:D, I:/\mu(in.@(A,B)), if B==>@A then I in D and "1st" in F(I, A) and "2nd" in F(I, B). For all A,B:D,T:@@(A,B) then "1st" in F(T,A) and "2nd" in F(T,B). For all A,B:D,X:Sets, if A=X->B then A in @@(X,B) and "2nd" in F(T,B) and for some I:/\mu(X)(I in D and "1st" in F(I, A). This last rule also applies when A=B^n, A=#B, ... . Data For D1,D2:documentation, a:Arrow, Net{D1} a Net{D1}={R:@${D1, D2}| (R)${D1} a ${D2}. For two lists of declarations D1, D2, Data{D1. D2}::=Net{D1}->Net{D2}. Since D1 and D2 have no definitions or wffs then Data{D1. D2}=@{ Net{D1. D2} | for all D1, one D2} If we are discussing fixed set of variables and types with unique identifiers then this is abreviated further - suppose we have a DICTIONARY; DICTIONARY=Net{ v1:T1, ...} and let type(v[i])=t[i], then Data{ k1,k2,...,k[n]. d1,d2,..., d[m]} ::=@DICTIONARY(k1,k2,...,k[n])->(d1,d2,...,d[m]). . Data Base Notation If D is a list of declarations then \/{D} is a discriminated union and so a natural model for a data base. The components of D are the different types of data/entities. Open Example Data Base -- University U::=Net{from previous example ...}. UNIVERSITY::=\/{ student::@U(sid)->(name,address,..). section::@U(sched_number)->(course, time). class::@U(courses)->(dept, course_number, course_name, units, ...). enrollment::@U(sid,sched_number)->(...). }=::UNIVERSITY. { student(123-45-6789, "John Doe", ...), student(234-56-6789, "Jane Roe", ...), section(12345, CS121, (MWF, 0800..0910)), section(13244, CS121, (TT, 0800..1000)), course(CS121, CS, 121, "Computers and Society", 2), enrollment(234-56-6789, 12345, ...), ... } in @$UNIVERSITY. .Close Example Data Base -- University .Open Normalisation of data. . Purpose The aim of normalization is to document all the structure in some given data in terms of simple sets and mappings. Or, in McCathy's terminology, the purpose is to derive the abstract syntax from the concrete syntax. A third view is that we wish to discover and name all the implicit relationships. Finally we can say that normalization derives a standard semantic domain to model what is known about some data. The effect is a repeated partioning of an initially homogeneous set of n-tuples into different classes and types. Normalization is an analytical tool more than a design tool. The goal is to model part of the problem not to work out detailed solutions. Design has to be concerned with performance, physical limitations, and ways to encode the necessary data. Relational data is a starting place for design and a useful user interface. They are simple to understand and use but tend to not use machinary efficiently. The relational retrieval systems are also limmitted in power[]. However they are a high powered analytical tool separating an area into many simple relations. These relations are a canonical basis for design work. . 0NF, 1NF, 2NF, and 3NF First, second, and third normalization steps operate on a set of sets of simple types by repeatedly replacing one type by two or more types with the subsets of the variables. The names of the variables are unique and their domains don't change. Often the domains are strings. Hence the domains are omitted. One starts with samples, displays, or existing data structures. Abstraction starts by naming component data and their domains. Then a plausible key is postulated and the result normalized until all functional dependencies are represented by key->data relationships. The 3 step process goes thru an easy to recall pattern : the key, the whole key, and nothing key. NORMALIZATION::=following, .List (0NF): A set of dictionaries, each listing a set of variables describing some sample forms, records, data structures, ... No duplicated variable names. (1NF): For each dictionary choose a subset of the variables that can act as a primary key for that set of data. Extract repeating groups of data into new dictionaries. Make sure that each key (set of variable values) identifies a unique item in the data set. (2NF): The Whole key. Make sure that for each key no part of the key can identify part of the data. (3NF): Nothing But the Key. Make sure that no part of the key identifies another part of the key. .Close.List Notice there are further normalization steps that can be applied to further expose more hidden data structures. .Open Formal model DICTIONARY::=Net{v1:D[1], v2:D[2], v3:D[3], ..., v[n]:D[n]}. samples::@DICTIONARY. @DICTIONARY(k1, k2, ...)?-?(d1, d2, ..). samples >--\/{ a:: @DICTIONARY(...)->(...). b:: @DICTIONARY(...)->(...). c:: @DICTIONARY(...)->(...). ... } .Close Formal model . Notation Data{ key1, key2, key3, ... . non_key1, non_key2, ...} .Open Example Data Base: Shopping Consider the data collected when grocery shopping. Here is the analysis of a shopping list as part of the work leading upto a "shoppers assistant" for an electronic organizer or personal digital assistant. The data on the list are shopping_list_data::=Net{ product_name, shop::string. price, tax, total::money. quantity::Nat0. date::Dates. time::Times. bought::@. }=::shopping_list_data. Now select a key: product_name samples>-- \/{ s0::Data{product_name. shop, price, tax, total,quantity, date, time, bought} }. Then remove repeating data/many-many relations: samples>-- \/{ s1::Data{product_name.}. s2::Data{product_name, date, shop. quantity, price, bought}. s3::Data{date, shop. tax, total, time}. } There are no part key dependencies. There are no non-key dependencies. database::=\/{ Products::Data{product_name:string.}. Orders::Data{product_name:string, date:Date, shop:string. quantity:Nat0, price:money, bought:@}. Shopping_trips::Data{date:Dates, shop:string. total, tax:Money, time:Times}. |-product_name in Orders->Products. |-(date,shop) in Orders->Shopping_trips. } Here is the analysis of a receipt from a cashier at a local supermarket: RECEIPT::=Net{ name_of_store, register_n, operator_n, payment_method::string. line_item::Data{product_desc:string. unit_price:Money, qty:Nat, total_price:Money, tax_flag:@}. date::Dates. time::Times. subtotal, tax_paid, ammount_tendered, change::Money. }=::RECEIPT. 0NF_RECEIPT::=\/{r1:@RECEIPT(name_of_store, date, time)->(...).} Extract repeated group of line items at the store. 1NF_RECEIPT::=\/{ r1::@RECEIPT(name_of_store, date, time)-> (reg_n,op_n, subtotal, payment_method, ammount, change, paid). r2::@RECEIPT(name_of_store, date, time, product_description)-> (qty, unit_price, total_price, tax_flag). } Now part-key dependencies: 2NF_RECEIPT::=\/{ r1::@RECEIPT(name_of_store, date, time)->(reg_n, op_n, subtotal, payment_method, ammount, change, paid). r2a::@RECEIPT(name_of_store, date, product_description)->(unit_price, tax_flag). r2b::@RECEIPT(name_of_store, time, product_description)->(qty, total_price). } The are no non-key dependencies, 3NF_RECEIPT::=2NF_RECEIPT. data_base{ Visit_to_store::Data{name_of_store, date, time. reg_n,op_n, subtotal, payment_method, ammount, change, paid}. Pricing_at_store::Data{name_of_store, date, product_description. unit_price, tax_flag}. .Note Stores have different product descriptions and so even though the state determines taxes, the relationship between product_description and tax will be different in different stores. This would not true if something like the UPC was used as a key . Example -- Universal Product coding UPC_based_data::=Net{ Pricing_at_store::Data{name_of_store, date, upc. unit_price}. UPC::Data{upc. product_description, bar_code, ...}. Taxing::Data{upc, date. tax_flag}. }=::UPC_based_data. Bought:Data{name_of_store, date, time, product_description. qty, total_price}. |-(name_of_store,date) in Bought->Visits_to_store. |-(name_of_store,date,product) in Bought->Pricing. } .Close .Open Multi-valued Dependencies .Reference Wu 92, Margeret S. Wu, The Practical Need for a Fourth Normal Form, 23rd SIGCSE Tech Symp on CSEd 92, SIGCSE Bull V24n1(Mar 92)pp19-23 . Example -- Hospital Physicians work at a location and are members of groups. The following is in 3NF (BCNF): Physician_work::@\${Physician_id, Group_id, Location_id:Identifiers}. However it is not in 4NF because (in this hospital) if a physician p is member of groups g1 and g2 and works at l1 and l2 then 4 tuples have to be used to record all the relationships: (p,g1,l1) (p,g2,l1) (p,g1,l2) (p,g2,l2) If the physician leaves g1 then two tuples are removed: (p,g1,l1) (p,g1,l2). Hence to get 4NF the relationship physician_work must be split, as follows: membership::Data{Physician_id. group_id}. and works_at::Data{Physician_id. location_id}. So membership={(p,g1), (p,g2),...}. works_at={(p,l1), (p,l2),...}. Let R be a subset of type T with maps x, y, and z into x:T->X, y:T->Y, and z:T->Z. A multivalued dependency MVD(x,y,z) occurs iff there is a value of x` where whenever two different tuples have the same x and different y and z values then there are two more tuples in R. For Types T, R:@T,x,y,z:#variables(T), R in MVD(x,y,z) ::=for all r1, r2:R, k:=r1.x, if r2.x=k and r1.y<>r2.y and r1.z<>r2.z then for some r3,r4(r3.(x,y,z)=(k,r1.y,r2.z) and r3.(x,y,z)=(k,r2.y,r1.z)). An MVD(x,y,z) is trivial if y is subset of x (|y==>|x) or x and y form a complete set of variables of T (|(x!y)=variables(T)). A set of relations in BCNF is in 4NF iff all MVDs are trivial. .Close .Close Normalisation of data . Quantification and Verification ... DATABASE::=Net{ ... For A,B, p:P(A,B), arrow(p) ::=(A, p, B). CATEGORY(Objects=>names(D), Arrows=>{arrow(p)||for some A,B:D (A=B and p=Id(A) or p in paths(A,B)) } ) ... For T1,T2:D, f:F(T1,T2), ratios(f) ::={n:Nat0 || for some y:T2( n=|/f(y)| )}. For T1:D, size(T1) ::= |T1|. ... For T1,T2:D, f:F(T1,T2), average_ratio(f) ::= (+[y:T2](|/f(y)|))/|T2|. For T1,T2:D, f:F(T1,T2), ok(f):@::=(average_ratio(f) in (|T1|/|T2|)*(0.90..1.10) }=::DATABASE. .Close Types as a Data Model .Open Types as Object-oriented objects To Be Done! .Close Types as Object-oriented objects