.Open Introduction to Types
. Personal Note
When I started this page I did not realize how much information it would have to
carry to make the MATHS language useful.
. Introduction
In any project there will be a number of more less defined types. Obviously
in computer projects we have
have "data types", "Entitiy types", "ADTs", classes of objects, or other technologically
bound models of reality. The MATHS type system has to handle these.
But we will also (in real projects) have to think about the real world -- cats, dogs,
candlesticks, and sealing wa, cabbages and kings. These are the
objects that the software describes. And in theoretical work we will often be
working in terms a set of elements with given operations and properties -- algebras and
abstract data types.
Some of
these are standard well known types and are described below. Then there
are the types that are peculiar to the domain being studied. This set of
types is called the `basis` of the domain. Finally there are the types that
can be constructed from the basic and standard types to give structured
types (sets, tuples, relations, ...). This collection of basic, standard
and constructed types is called the universe of discourse of the project.
The key discovery by Whitehead and Russell in their work on "The Principles
of Mathematics" was that the idea of type could be used to avoid paradoxes
as long as it was a syntactic concept. This means that each type defines
those expressions that are valid or meaningful as well as what meaning can
be attached. The Type system outlaws from the logic
any discussions of expressions that
involve contradictions in terms and paradoxes like the Russell Set. Thus
any theory of types is
a metalinguistic activity and taks place in yet another universe of
discourse.
Simply put: Each type is linked to a set of expressions and a set of
meanings
and connects them together. And each type does this in its own way. Types
are the main way that MATHS is extended.
All types must define a non-empty set of objects -- a valid universe of discourse --
so that we can use the predicate calculus to reason about objects of that type.
. Standard Types
The following are common descriptions of types
(Alphabets):
Generic finite set. Probably characters or symbols.
Operations=>{(!), (1st), (rest),...}, concatenation, disassembly of strings.
(@):
The propositions, Values=>{true,false},
Operations=>{and,or,not,...}
(Char): A Finite set of 256 characters used in MATHS.
The ASCII characters, See
.See http://www/dick/samples/comp.text.ASCII.html#Main%20Content
Values: "a", "b", char(26),...
(Unicode):
Used to describe languages like Java, JSON, Javascript, etc that use Unicode
.See ../samples/glossary.html#UNICODE
Values: To Be Done
.Hole Unicode
(Documents):
Pieces of Documentation,
Operations=>{$,...and,or,...}, see below.
(Events):
A finite set of elements written in a different language.
(Numbers):
Type that includes real,integer, etc types,
Operations=>{+,-,*,...}
(Dimensions):
Length, Mass, Time, Information, Area, Angle,...
Dimensions, each is a separtate type.
Operations=>{+,-},
For all D:Dimension, some units:@(Number->D). (eg 60.sec=1.hr,8.bit=1.byte)
(Sets):
Generic, For S:Sets is short for For T:Types, S:@T,
Operations=>{&, |, ~, ...}
(Times):
A special ordered set with elements written in a different language
(Types):
Generic
. Basic Types
In real systems we work with universes of discouses like: Employees,
Products, Animals, Telephones,... Michael Jackson notes that we need to
be able to designate individuals in these domains. Terry Halpin has noted
that the individuals/elements/objects/"values" can be described by a definite
description like this
.As_is the Employee with SSN 123-45-6789
.As_is the[e:Employee](e.SSN="123-45-6789")
It seems wise to introduce -- as a partly-baked idea --
a standard way to designate objects of a type:
.As_is Employee("123-45-6789")
Here is a possible formalization:
BASIC_TYPE::=following,
.Net
T::Types.
identifier::Set=given.
designation:: identifier >-> T.
|- designation in identifier --- T, a one to one mapping.
For i:indentifier, T(i) ::T= designation(i).
.Close.Net
So we might define, as an example,
example::=Net{
SSN::Sets.
...
Employee::Types.
|- $BASIC_TYPE(Employee, identifier=>SSN).
}=::example.
By further defining functions, predictaes, sets, relations, etc.
we can easily model the types of objects in the real world.
This conceptual model is an excellent guide to designing data bases,
a natural ontology for the domain,
and a source of ideas for classes in object-oriented programming.
. Type descriptions
A Type is defined to have two key attributes attributes:
|-TYPE::=Net{
description::#lexeme=`symbolic representation of the type`.
objects::Sets=`set of values or things of this type`.
...
}=::TYPE.
Type_descriptions::=Net{
Notice that a type T is not a member of a type.
. Syntax
type_description::= $type_power | $type_product,
type_power::=$type_product "^" $type_product,
Note -- this outlaws type descriptions like `T1^T2^T3` to avoid an
ambigutiy, since we can show that `(T1^T2)^T3` is different to `T1^(T2^T3)`.
An example
{0,1}^{0,1}
has the set of maps of bits into bits including elements like the identity and
negation operation (not).
type_product::= $type_factor | $type_factor "><" $type_product,
Note -- this defines the associativity of products of types:
`T1>< Number for example all have a sigle character and a single number
in them.
type_factor::= "@" $type_element |$type_element,
The type @T has the set of all subsets of T as elements.
type_element::=name_of_a_type | "(" $type_description ")" | $type_discriminated_union | $enumerated_type.
enumerated_type::= "{" list_of_values "}".
If `D` is a list of declarations then \/{`D`} is a discriminated union
or co-product. There is also a longer form that uses the "Net" syntax.
type_discriminated_union::=$co_product | $long_form_of_discriminated_union.
co_product::="\/" "{" $list_of_declarations "}".
list_of_declarations::= $declaration #( "," $declaration).
declaration::= identifier ":" $type_description.
A discriminated union like
\/{ character: Byte, number:Nat }
has elements like
character("a"), number(1), character("1"), etc.
Discriminated unions are useful for decribing data bases and partial
tples. A possible semantics is found in the theory of co-products
in Category Theory($CATEGORY).
CATEGORY::=http://www/dick/maths/math_25_Categories.html.
Briefly a `product` defines a composite structure with maps from the
composite to the components. A co-product has the reverse maps -- from
each component into a unique element of the co-product.
. Partly Baked Ideas
Two ideas
.Net
The long form leads to better rendering. The "\/" symbol is followed by a
net of declarations.
long_form_of_discriminated_union::= "\/" net
$TBA.
The meaning, is to extract the declarations, ignore the constraints,
and create a list `D` of declations for the short form.
.Close.Net
.Net
This is needed to define rational numbers (for example) as a type.
type_quotient::= $type_element "/" (equivalence_relation \ function).
A type_quotient `T`/`E` where E:equivalence(T) is defined as any chosen isomorphism
of the set of blocks or equivalent elements of `T`. Simarlarly `T`/`f` where
`f` is a map from T into T2 (f:T2^T), then the lelements of `T/f` are the inverse
images in T of elements in T2 -- for each t2:T2, { t:T. f(t)=t2}.
See
.See Quotient types
below.
.Close.Net
. Semantics
As a rule the description is a string that describes the objects in the
following way:
|- (desc-objects): For all T1,T2:Types, if T1.description= T2.description then T1.objects=T2.objects.
|- (hat_means_power): For all T1, T2, T3:Types, if T1.description=T2.description "^" T3.description then T1.objects=(T2.objects)^(T3.objects).
|- (at_means_subsets): For all T1, T2:Types, if T1.description="@" T2.description then T1.objects=@(T2.objects).
|- (cross_means_product):For all T1, T2, T3:Types, if T1.description= T2.description "><" T3.description and T1.description in type_factor then T1.objects=(T2.objects) >< (T3.objects).
|- (parentheses_mean_parentheses): For all T1,T2:Types, if T1.description="(" T2.description ")" then T1.objects=T2.objects.
Need to give formal semantics for coproducts, and quotients.... and tidy up.
.Hole types
}=::Type_descriptions.
.Open Standard subsets/classes
. Finite_Sets
This description
Finite_sets=`sets which are not isomorphic to a proper subset of themselves`,
is not type-safe. The only type safe expression is to talk about the finite sets
of elements of a `given type`, like the following:
For Type T, Finite_sets(T)::@T=`subsets that are finite`.
For T, S:@T, Finite_sets(T)::@T={F:@T. F in Finite_sets(T)}.
For T, S:@T, finite(S)::@=S in Finite_sets(T)}.
The idiom
Let A by a finite set, ...
is common in computer science and should be interpreted as short hand for
T: Type,
A: @T,
|- finite(A).
Where T is some unused variable.
These can be defined by enumation: Syntax: {a,b,c,...},
or by construction since the union of two finite sets is finite and the intersection of
two finite sets is finite.
Some theories of finiteness can be found at
.See ./logic_6_Numbers..Strings.html
discussing numbers.
. Integers
Int, Z
Integers,Whole Numbers,
Values=>{...-3,-2,-1,0,1,2,3...},
. Numbers
Rational, Q Quotients/Ratios
Fractions $(numerator:Nat0,denominator:Nat,...}
Real, R The Real Numbers,
n..m::={i:Z || n<=i<=m}.
Bit::=0..1.
n..::={i:Z || n<=i}.
..m::={i:Z || i<=m}.
Natural::={n:Z || n>=0}.
Nat0:=Natural.
Positive::={n:Nat0 || n>0}.
Nat::=Positive.
()|- Nat0 = Nat | {0} = 0.. .
Fixed(n) ::==-2^(n-1)..2^(n-1)-1.
Float(p,s) A finite subset of the Rational numbers (denominators=2^-i where i:Fixed(s),...)
Decimal(p,n):@Rational::=(-10^n+1..10^n-1)/(10^p),
Money::=Int/100.
Hence
|- n..m in @Z.
|- T^n ::= T^(1..n).
|- for n:Nat0, T^n ==> #T. So if x is in T^n then x has type #T.
.Open Standard subsets/classes
. Expressions and Overloading
As a generallity the type of an expression can always be determined
(sometimes relative to certain assumed types), from the type of the parts
and the operators that make it up. Even so a modicum of overloading is
allowed - an symbols can be defined to carry out different operations, (and
return different types, depending on the context. Such a definition is
written: term::context=expression
For example, here are three different views of an `assignment`:
assignment::syntactic= variable ":=" expression,
assignment::parsed= $Net{v:variable, e:expression,...},
assignment::semantic= map [v:variable, e:expression]( s'.v=e(s) and for all u:variable~{v}(s'.u=s.u)).
Standard examples of overloading:
(casts):
For T:Types, a:T, a::@T={a},
a::#T=(a),
a::%T=(1+>a).
(set extensions):
For T1,T2:Types, f:T1->T2, f::@T1->@T2={f(x) || x:T1(x in (_) },
For T:Types, op:infix(T), op::infix(@T)={x+y||x in (1st), y in (2nd)}.
As a rule: An overloading can not take an existing valid meaning in the given
context. So (|) is given a meaning for all sets and so can not be extended
to operate on sets as well.
Polymorphic and generic terms have definitions written like this:
For T:Types, term::t(T)=expression(t(T))
where `t(T)` is some formula that contains `T` as a free variable. If the
symbol `T` appears in the expression then it is interpreted as the
universal set of a generic type T. In general `T` can be any list of
symbolic variables. Completely generic terms ( "macros") can be defined
but are rare. If v1,v2,v3,... are previously undeclared variables then
.As_is For v1,v2,v3,..., M(v1,v2,v3,...) ::= e,
creates a function that applies to all types of operands and is short for
.As_is For T1, T2, T3,...:Types, v1:T1,v2:T2,v3:T3,..., M(v1,v2,v3,...) ::= e.
.Close Introduction to Types
.Open Theory of Types
Theory_of_types::=Net{
SEMANTICS ::= http://www/dick/maths/logic_7_Semantics.html
Types::@TYPE. We have a set of types, each one structured as a TYPE.
TYPE::=Net{
description::#lexeme=`symbolic representation of the type`.
objects::Sets=`set of values or things of this type`.
Note. If T is this type then x in T.objects iff `x has type T`.
Symbols for types have two meanings, one of these is in a normal part of an expression where it represents the set of `objects` or an identity map from `objects` to `objects` depending on the context.
(dual_meaning): 2 symbols o /representation(description).
(description_is_symbol)): symbol(description, @objects, objects).
The other meaning is when the description identifies the type in a declaration.
elements::@meaning_in(objects),
elementary_expression::=elements.syntax.
|-(elem): elements in element.syntax->objects.
expressions::@#lexeme.
The expressions of this type are defined in terms of objects in other types and so is not a property of this type alone. [See EXPRESSION below]
(elem_in_expr): elements.syntax==>expressions.
equality:: equivalence_relation(objects). [STANDARD.relations]
infix_pairs::@meaning_in(infix(objects)). [STANDARD.infix]
infixes::=`Lexemes referring to binary operations`.
infixes::=infix_pairs.syntax
EXPRESSION uses infixes(T) to define expressions(T).
(precedence): STRICT_ORDER(infixes, has_precedence_over)Chapter 4, section 8]
Used in EXPRESSION.
propositional_functions::@references(@^objects).
Used in Proposition Calculus.
parallels::@meaning_in(@(objects, objects)).
parallel::=parallels.syntax.
(equals_is_parallel): ("=",equality) in parallels.
serial::@infixes={s.syn || for some s:infix_pairs( associative(s.sem)) }~parallel.
other_operators::=@symbols~infix~parallel,`operators giving results of this type`.
(operators_mean_maps): For all o:other_operators, some domain:Types, o.type=domain->objects.
operators::={(representation=>i.syntax, type=>infix(objects), meaning=>i.semantics) || for some i:infix_pairs} | other_operators.
See Also Expressions.
.See Expressions
}=::TYPE.
(syntax_symbolizes_semantics): For all T:Types, e:T.elements, symbol(e.syntax, T, e.semantics).
(descriptions_symbolise_types): For all T:Types, symbol(T.description, Types, T).
|-(type_symbol_means_universal): For all T:Types, symbol(T.description, @T.objects, T.objects).
(different_types_do_not_overlap): For all T1, T2:Types, T1=T2 iff some (T1.objects&T2.objects).
(parallels): For T:Types, P:T.parallels, PARALLEL(expressions(@), T.expressions, P.syntax, P.semantics).
(serials): For T:Types, s:T.serial, SERIAL(T.expressions, T.expressions, s, T.infix_pairs(s)).
??{take precedence into account}?
}=::Theory_of_types.
. Examples of theoretical types
(Example_PC)::Types=following,
.(
description=>"@",
elements=>{("true", true), ("false", false)},
equality=>( (true, true) +>true | else+>false),
infix_pairs=>{("and", and), ("or", or), ...},
serial=>{"and", "or"},
parallel_pairs=>{("iff", iff), ("=", equality), ...},
other_operators=>
.(
{ ("=", T>@, T.equality) || for some T:Types}
|{("in", T><@T->@, in) || for some T:Types}
|...,
.)
...
.)
or even
For T:Types, Example_Sets(T) ::Types=following
.(
description=>"@"! T.description,
elements=>{("{}",{}), (T.description, T.objects)}
|{ "{"!e!"}" || e:T.elementary_expressions},
infix_pairs=>{("&", &), ("|", |),...},
serial=>{"&", "|"},
parallel_pairs=>{("=", =), ("==>", ==>), ...}, ...
.).
.Close Theory of Types
.Open Meta-theory of types
. Babel-Aleph-Null
The definition of the type '@' in terms of type theory is an interesting test case of the system, and is rather like attempting to build the foundation of a building on its roof. The inabillity to this suggests limits to the system. The successful construction of a model of logic, on a theory of types, expressed in terms of that logic, establishes the power but not the consistency of the system.
@::Types=following,
.List
symbol=>"@",
elementary_expression=>"true"+>true|"false"+>false,
infix_pairs=>"and"+>((true,true)+>true |+> false) |"or"+>((false,false)+>false |+> true),
parallels=>"="+>({(true,true),(false,false)}+>true |+> false) |"iff"+>({(true,true),(false,false)}+>true |+> false),
serial=>{"and","or"},
other_operators=> | following,
.Set
{ ("not",@->@,true+>false|false+>true)}
{("if_then_",distfix(@,@,@),(true,false)+>false |+> true)}
{("in", @T, in) ||T:Types}
{("==>", @T><@T->@,==>)||for T:Types}
{("=", @T><@T->@,=)||for T:Types~"@"}
{("all", @^T->@, map F:@^T(img(F)=true)) ||for T:Types }
.Close.Set
.Close.List
@.operators ::=following,
.Table representation Type meaning
.Row not @->@ true+>false|false+>true,
.Row and @><@->@ (true,true)+>true |+> false,
.Row or @><@->@ (false,false)+>false |+> true,
.Row iff @><@->@ {(true,true),(false,false)}+>true |+> false,
.Row if_then_ @><@->@ (true,false)+>false |+> true,
.Row in T><@T->@ map x,X(x.X),
.Row = T>@ map x,y((x,y).(equality(T))),
.Row all @^T->@ map F:@^T(img(F)=true),
.Row ==> (@^T)><(@^T)->@ map [A,B] (all(map x(if x.A then x.B))).
.Row ...
.Close.Table
quantified_wff::={( ("for" Q B "(" P ")"), (Q "(" "map" "[" B "]" "(" P ")" ")" )) || Q:{all,no,...}, B: (variable (colon | equals) set_expression), P:@.representation}
. Consistency and Non-empty types
Types must always define a non-empty set of objects. And a simple
way to prove this is give a finite example.
Consistency is demonstrated by exhibitting a finite model.
Another techniqu is to use the simplest (non-recursive) constructions.
. Avoiding paradoxes
Notice that sets are defined by a formula
{x:T||B},
which explicitly assigns a type to the variable. But the collection of all types is not a type so we can not define a set of all sets and so should avoid the paradoxes discovered at the beginning of the 20th century. Notations like `X:Type` or `Type X` can only be used to set of a generic formula.
We can prove that the set of `data types` leads to a consistent theory by
constructing a model in terms of naive set theory and n-tples. The data
types are those generated from @ and Nat by operations of fnite Products
and coproducts. The proof of relative consistency is done by using Codd
normalizations. Model each type as a set of n-tpls of elementary object.
Replace T2^T1 by (T1>Nat0= map[X](|X|).
So, this is true
Card{} = 0,
Card{"x"} = 1,
Card{"x","y"} = 2
The following expression
/Card(1)
has an ambiguous type -- the set of singletons of unknown type. Instead make
the type explicit
Strings@1 = sets of strings with one element.
.Close Meta-theory of types
.Open Deriving New types
Wechler presents an extensive theory of univeral algebras [Wechler 93, Section 3.1].
. Products
.See Notation $Net{a:T1, b:T2 ...}
. Quotient types
If there is a map from one type to another then there is an easy way to
generate a new type - by grouping the elements of the second type according
to their values under the mapping - this is called the quotient type with
respect to the mapping.
Given f:T1^T2 we can always derive a new type (symbolized T2/f) isomorphic to a sub-set of @T2 where
X:T2/f iff for some y:T1, all x:X(f(x)=y).
This called the quotient type of T2 with respect to f.
Given a R:@(T1>Z, 2nd=>Z)/equiv where ( (a,b) equiv (c,d) iff d a = b c)).
.See Notation.
.Close Deriving New types
.Open Typed Expressions
. Expressions
To complete this section here are the definitions of the syntax of
expressions. A string of lexemes that is not terminated by a comma or
period occurring outside parenthesis of some kind or other, can be
expression. But an expression can not contain a comma or period outside of
parenthesis. You can therefore use punctuation to recognised `balanced`
expressions. A second cut needs a set of `operators` - those symbols that
only make sense when accompanying other symbols - and defines whether an
expression is `functional`. The final check makes sure that the types of
actual parameters match the operators on them. These will be presented in
reverse order.
EXPRESSIONS::=Net{ Uses SEMANTICS. Uses Type_Theory. Uses ASCII_code.
expression::= balanced & functional & ( |[T:Types] typed_expression(T) ).
We start be defining the set of fully parenthesized expressions(p_expressions) of a given type and then state circumstances under which parentheses can be removed without change of meaning.
p_expression::abbreviation=fully_parenthesized_expression.
fully_parenthesized_expression::Types->@#lexeme.
For T1: Types, p_expression(T1) ::= following,
.( elementary_expression(T1)
| variable(T1)
| UNARY(representation(T1), other_language_string)
| |{UNARY(p_expression(T2), p_expression(T1^T2)) || T2:Types}
| |{BINARY(p_expression(T2), p_expression(T3), p_expression(T1^(T2><(t))}
| |{UNARY(p_expression(@T2), p_expression(@(T2>typed_expression(T).
The semantics of MATHS expressions are trivial when formalized in MATHS.
The correct meaning of a MATHS expression looks (in MATHS) like itself .
Spivey has remarked on a similar phenomena in the attempt to define Z in
terms of set theory and set theory using Z. The denotation (map from syntax
into semantics) is the identity mapping. Such a circular definition can be
shown to be useless for ignorant readers because ignorance is a trivial
fixed point[Spivey 88]. Luckily, complete definition is not essential for
a human being to learn a language.
Future research can establish the semantics in terms of other languages - such as Z, Larch, ....
The syntax and a prossibly feasible way to parse expressions is a notational problem.
.See Notation.12 Expressions
}=::EXPRESSIONS
.Close Typed Expressions
.Open Types as a Data Model
.Open Formal Model
. Entities and Maps
DATABASE::=Net{ Collection of named types, objects, and maps.
Uses lexemes, type_theory, EXPRESSION.
T::Finite(Types)=`Entity types`.
|- For all t:Types, t.\mu=t.denotation in
expressions(t)->objects(t).
D::=`Entity descriptions`.
`Object classes`::=Finite(|[t:T]expressions(@t)).
For A:D, E(A) ::=`Named objects in A`:Finite(expressions(A)).
For A,B:D, F(A,B) ::=`Named maps from A to
B`:Finite(expressions(B^A)).
For no A, `Id(A)` in F(A,A).
. Paths Through the DataBase.
For A,B:D, S(A,B) ::@(A,B)=\mu(F(A,B)) | {R:@(A,B)|| /R in \mu(F(B,A))}.
For A,B, P(A,B) ::@(A,B)=`paths from A to B`,
For A,B, P(A,B) ::@(A,B)=S(A,B) | { s;p ||for some C:D, s:S(A,T3), p:P(C,B)},
L(A,B) ::@(expressions(B^A))=/\mu( { s1;s2;p || for some C1,C2:D, s1:S(A,C1), s2:S(C1,
C2), p:P(C2,B)}~Id(A)
).
...
connected::@=for all A,B, if A<>B then some(P(A,B)).
For A,B, redundant(A,B) ::@=some(F(A,B) & L(A,B)).
redundant::@=for some A,B(redundant(A,B)).
).
For A,B, M(A,B) ::`basic maps`=F(A,B)~L(A,B) .
...
conceptual_model::=`Bachman Diagram`,
conceptual_model::=digraph(
vertices=>D ,
edges=> {(v1,v2) || for some A,B:D, m:M(A,B)(v1=names(A) and
v2=names(B) )
).
...
conceptual_graph::=digraph(
vertices=>D | |[A:D]E(A) | |[A,B:D]F(A,B),
edges=> { (e, A) || for some A:D, e:E(A) }
| {(m,B) || for some A,B:D, m:M(A,B) }
| {(A, m) || for some A,B:D, m:M(A,B) }
}
).
...
paths, prime_links, links::(names(D))^( names(D)>names(D),
Arrows=>{arrow(p)||for some T1 ,T2:D (T1=T2 and p=Id(T1)
or p in paths(T1,T2
)}.
. Conceptual Model
conceptual_model::=digraph(vertices=> names(D)| |[T:D]names(E(T))|
|[T1,T2:D]names(T1,T2),
edges=>{(v1,v2) || objects(v1)in classes(v2)
or maps(v1) in B(dom(v1),cod(v1) and
cod(maps(v1))=classes(v2)
or maps(v2) in B(dom(v2), cod(v2)) and
dom(maps(v2))=classes(v1)
}
).
rationalized::@=no cycles( conceptual_model).
if rationalized then conceptual_model in DAG.
}=::DATABASE.
. Relationships with a key
Suppose we have a network
N:=Net{v1:V1, v2:V2,...},
and k and d are two lists of variables in v1,v2,v3,...
k,d:#|(v).
where k do not overlap:
no |k & |d.
Define T:=${k1:k1($N),...d1:d1($N),...}, then
(@N k -> d) ::=`sets in @T with k determining d`.
The above is sometimes called a functional dependency. These are
worth noticing since they indicate the logical structure that
underlies the original design.
The process starts with a set S:@N and finding a k and d such that
S in (@N k->d).
If k is large as possible and d is all the other variables then
k is a suitable key or identifier for a database. If niether
S.k nor S.d themselves contain any functional dependencies then the
set S is in third normal form(TNF, 3NF,...).
k nor
. Example -- university
U::=Net{
sid, name, address, sched_number, course, time, course_number, course_name, dept, units,...::string.
}=::U.
The following would some of the dependencies that we can use
to argise a model of this university:
courses::@U(course)->(course_number, course_name, dept, units).
students::@U(sid)->(name,address).
enrolment::@U(sched_number,sid)->().
class::@U(sched_number)->(course, time).
.Close
. Equivalence of Models
...
. Rationalisation
It is possible to express any finite collection of structured sets as a network database by applying rules like those in the LDST step in SSADM. Here are some examples,
For all A:D, C:#D, if A= ><(C) then for all i:pre(C), some
I:/\mu(i.th)( I in F(A, C[i])).
For all A:D, s:Documentation, (v,B):declarations(s), if A=$s then v
in F(T,B).
For all A,B:D, if A==>B then "Id" in F(A,B).
For all A,B:D, I:/\mu(in.@(A,B)), if B==>@A then I in D and "1st"
in F(I, A) and "2nd" in F(I, B).
For all A,B:D,T:@@(A,B) then "1st" in F(T,A) and "2nd" in F(T,B).
For all A,B:D,X:Sets, if A=X->B then A in @@(X,B) and
"2nd" in F(T,B) and for some I:/\mu(X)(I in D and "1st" in
F(I, A).
This last rule also applies when A=B^n, A=#B, ...
. Data
For D1,D2:documentation, a:Arrow,
Net{D1} a Net{D1}={R:@${D1, D2}| (R) ${D1} a ${D2}.
For two lists of declarations D1, D2,
Data{D1. D2}::=Net{D1}->Net{D2}.
Since D1 and D2 have no definitions or wffs then
Data{D1. D2}=@{ Net{D1. D2} | for all D1, one D2}
If we are discussing fixed set of variables and types with unique identifiers then this is abreviated further - suppose we have a DICTIONARY; DICTIONARY=Net{ v1:T1, ...}
and let type(v[i])=t[i], then
Data{ k1,k2,...,k[n]. d1,d2,..., d[m]} ::=@DICTIONARY(k1,k2,...,k[n])->(d1,d2,...,d[m]).
. Data Base Notation
If D is a list of declarations then \/{D} is a discriminated union and so a natural model for a data base. The components of D are the different types of data/entities.
Open Example Data Base -- University
U::=Net{from previous example ...}.
UNIVERSITY::=\/{
student::@U(sid)->(name,address,..).
section::@U(sched_number)->(course, time).
class::@U(courses)->(dept, course_number, course_name, units, ...).
enrollment::@U(sid,sched_number)->(...).
}=::UNIVERSITY.
{ student(123-45-6789, "John Doe", ...),
student(234-56-6789, "Jane Roe", ...),
section(12345, CS121, (MWF, 0800..0910)),
section(13244, CS121, (TT, 0800..1000)),
course(CS121, CS, 121, "Computers and Society", 2),
enrollment(234-56-6789, 12345, ...),
...
} in @$UNIVERSITY.
.Close Example Data Base -- University
.Open Normalisation of data.
. Purpose
The aim of normalization is to document all the structure in some given
data in terms of simple sets and mappings. Or, in McCathy's terminology,
the purpose is to derive the abstract syntax from the concrete syntax. A
third view is that we wish to discover and name all the implicit
relationships. Finally we can say that normalization derives a standard
semantic domain to model what is known about some data. The effect is a
repeated partioning of an initially homogeneous set of n-tuples into
different classes and types.
Normalization is an analytical tool more than a design tool. The goal is
to model part of the problem not to work out detailed solutions. Design has
to be concerned with performance, physical limitations, and ways to encode
the necessary data. Relational data is a starting place for design and a
useful user interface. They are simple to understand and use but tend to
not use machinary efficiently. The relational retrieval systems are also
limmitted in power[]. However they are a high powered analytical tool
separating an area into many simple relations. These relations are a
canonical basis for design work.
. 0NF, 1NF, 2NF, and 3NF
First, second, and third normalization steps operate on a set of sets of
simple types by repeatedly replacing one type by two or more types with the
subsets of the variables. The names of the variables are unique and their
domains don't change. Often the domains are strings. Hence the domains
are omitted.
One starts with samples, displays, or existing data structures. Abstraction
starts by naming component data and their domains. Then a plausible key is
postulated and the result normalized until all functional dependencies are
represented by key->data relationships.
The 3 step process goes thru an easy to recall pattern : the key, the whole key, and nothing key.
NORMALIZATION::=following,
.List
(0NF): A set of dictionaries, each listing a set of variables describing
some sample forms, records, data structures, ... No duplicated variable names.
(1NF): For each dictionary choose a subset of the variables that can act
as a primary key for that set of data. Extract repeating groups of data into
new dictionaries. Make sure that each key (set of variable values)
identifies a unique item in the data set.
(2NF): The Whole key. Make sure that for each key no part of the
key can identify part of the data.
(3NF): Nothing But the Key. Make sure that no part of the key
identifies another part of the key.
.Close.List
Notice there are further normalization steps that can be applied to further
expose more hidden data structures.
.Open Formal model
DICTIONARY::=Net{v1:D[1], v2:D[2], v3:D[3], ..., v[n]:D[n]}.
samples::@DICTIONARY.
@DICTIONARY(k1, k2, ...)?-?(d1, d2, ..).
samples >--\/{
a:: @DICTIONARY(...)->(...).
b:: @DICTIONARY(...)->(...).
c:: @DICTIONARY(...)->(...).
...
}
.Close Formal model
. Notation
Data{ key1, key2, key3, ... . non_key1, non_key2, ...}
.Open Example Data Base: Shopping
Consider the data collected when grocery shopping.
Here is the analysis of a shopping list as part of the work leading upto a
"shoppers assistant" for an electronic organizer or personal digital assistant.
The data on the list are
shopping_list_data::=Net{
product_name, shop::string.
price, tax, total::money.
quantity::Nat0.
date::Dates.
time::Times.
bought::@.
}=::shopping_list_data.
Now select a key: product_name
samples>-- \/{
s0::Data{product_name. shop, price, tax, total,quantity, date, time, bought}
}.
Then remove repeating data/many-many relations:
samples>-- \/{
s1::Data{product_name.}.
s2::Data{product_name, date, shop. quantity, price, bought}.
s3::Data{date, shop. tax, total, time}.
}
There are no part key dependencies. There are no non-key dependencies.
database::=\/{
Products::Data{product_name:string.}.
Orders::Data{product_name:string, date:Date, shop:string. quantity:Nat0, price:money, bought:@}.
Shopping_trips::Data{date:Dates, shop:string. total, tax:Money, time:Times}.
|-product_name in Orders->Products.
|-(date,shop) in Orders->Shopping_trips.
}
Here is the analysis of a receipt from a cashier at a local supermarket:
RECEIPT::=Net{
name_of_store, register_n, operator_n, payment_method::string.
line_item::Data{product_desc:string.
unit_price:Money, qty:Nat, total_price:Money, tax_flag:@}.
date::Dates.
time::Times.
subtotal, tax_paid, ammount_tendered, change::Money.
}=::RECEIPT.
0NF_RECEIPT::=\/{r1:@RECEIPT(name_of_store, date, time)->(...).}
Extract repeated group of line items at the store.
1NF_RECEIPT::=\/{
r1::@RECEIPT(name_of_store, date, time)-> (reg_n,op_n, subtotal, payment_method, ammount, change, paid).
r2::@RECEIPT(name_of_store, date, time, product_description)-> (qty, unit_price, total_price, tax_flag).
}
Now part-key dependencies:
2NF_RECEIPT::=\/{
r1::@RECEIPT(name_of_store, date, time)->(reg_n, op_n, subtotal, payment_method, ammount, change, paid).
r2a::@RECEIPT(name_of_store, date, product_description)->(unit_price, tax_flag).
r2b::@RECEIPT(name_of_store, time, product_description)->(qty, total_price).
}
The are no non-key dependencies,
3NF_RECEIPT::=2NF_RECEIPT.
data_base{
Visit_to_store::Data{name_of_store, date, time. reg_n,op_n, subtotal, payment_method, ammount, change, paid}.
Pricing_at_store::Data{name_of_store, date, product_description. unit_price, tax_flag}.
.Note
Stores have different product descriptions and so even though the state determines taxes, the relationship between product_description and tax will be different in different stores. This would not true if something like the UPC was used as a key
. Example -- Universal Product coding
UPC_based_data::=Net{
Pricing_at_store::Data{name_of_store, date, upc. unit_price}.
UPC::Data{upc. product_description, bar_code, ...}.
Taxing::Data{upc, date. tax_flag}.
}=::UPC_based_data.
Bought:Data{name_of_store, date, time, product_description. qty, total_price}.
|-(name_of_store,date) in Bought->Visits_to_store.
|-(name_of_store,date,product) in Bought->Pricing.
}
.Close
.Open Multi-valued Dependencies
.Reference
Wu 92, Margeret S. Wu, The Practical Need for a Fourth Normal Form, 23rd SIGCSE Tech Symp on CSEd 92, SIGCSE Bull V24n1(Mar 92)pp19-23
. Example -- Hospital
Physicians work at a location and are members of groups. The following is in 3NF (BCNF):
Physician_work::@${Physician_id, Group_id, Location_id:Identifiers}.
However it is not in 4NF because (in this hospital) if a physician `p` is member of groups `g1` and `g2` and works at `l1` and `l2` then 4 tuples have to be used to record all the relationships:
(p,g1,l1)
(p,g2,l1)
(p,g1,l2)
(p,g2,l2)
If the physician leaves `g1` then two tuples are removed: (p,g1,l1)
(p,g1,l2).
Hence to get 4NF the relationship physician_work must be split, as follows:
membership::Data{Physician_id. group_id}.
and
works_at::Data{Physician_id. location_id}.
So
membership={(p,g1), (p,g2),...}.
works_at={(p,l1), (p,l2),...}.
Let `R` be a subset of type `T` with maps `x`, `y`, and `z` into x:T->X, y:T->Y, and z:T->Z. A multivalued dependency MVD(x,y,z) occurs iff there is a value of `x` where whenever two different tuples have the same x and different y and z values then there are two more tuples in R.
For Types T, R:@T,x,y,z:#variables(T),
R in MVD(x,y,z) ::=for all r1, r2:R, k:=r1.x, if r2.x=k and r1.y<>r2.y and
r1.z<>r2.z then for some r3,r4(r3.(x,y,z)=(k,r1.y,r2.z) and
r3.(x,y,z)=(k,r2.y,r1.z)).
An MVD(x,y,z) is trivial if y is subset of x (|y==>|x) or x and y form a complete set of variables of T (|(x!y)=variables(T)).
A set of relations in BCNF is in 4NF iff all MVDs are trivial.
.Close
.Close Normalisation of data
. Quantification and Verification ...
DATABASE::=Net{
...
For A,B, p:P(A,B), arrow(p) ::=(A, p, B).
CATEGORY(Objects=>names(D),
Arrows=>{arrow(p)||for some A,B:D (A=B and p=Id(A) or p in
paths(A,B))
}
)
...
For T1,T2:D, f:F(T1,T2), ratios(f) ::={n:Nat0 || for some y:T2( n=|/f(y)| )}.
For T1:D, size(T1) ::= |T1|.
...
For T1,T2:D, f:F(T1,T2), average_ratio(f) ::= (+[y:T2](|/f(y)|))/|T2|.
For T1,T2:D, f:F(T1,T2), ok(f):@::=(average_ratio(f) in (|T1|/|T2|)*(0.90..1.10)
}=::DATABASE.
.Close Types as a Data Model
.Open Types as Object-oriented objects
To Be Done!
.Close Types as Object-oriented objects