.Open Languages
This set of notes are focussed on developing a theory of languages that can be used to
define realisticformal languages and develop tools to manipulate them. As a result
they can, by Michael A Jackson's pioneering methods they can be used to
develop programs that process data (JSP) and Management Information Systems (JSD).
I have written a monograph
.See ../monograph/
to explore this.
.RoadWorksAhead
.Hole Languages and Grammars
The mathematical model of a language is a set of strings. The strings must
be sequences of zero or more symbols taken from a finite set often called an
alphabet.
LANGUAGE::=following
.Net
|- $STRINGS
Alphabet::Sets,
|- finite Alphabet.
Strings::=#Alphabet.
Languages::=@Strings.
See also
STRINGS::=http://www/dick/maths/math_62_Strings.html.
.Open Properties of Languages
The set of languages for a given alphabet forms a Boolean Algebra with repsect to
union, intersection, and complement, of course. More they form a Monoid with respect
to concatenation:
For A,B:Languages, A B ::Languages= {a ! b . a:A, b:B}.
1 ::Languages= {}.
()|-MONOID( Languages, (), 1 ).
Indeed, we have a Ring with Union and Concatenation, but not with intersection and concatenation.
.Hole codes, equivalence, etc.
.Close Properties of Languages
.Close.Net
. Grammar
A Grammar defines a language and imposes a stucture on the strings
in that language. Chomski invented a model for how grammars work.
A grammar describes (explicitly) rules for generating a correct
strings in a given language. If well chosen each string in the language
can be generated in essentially only one way. This way of generating
the string defines the structure of the string - how it is parsed.
Grammars -- especially written in various extensions of Bachus-Naur-Form (BNF) --
are commonly used to define computer languages. Several examples can be found in my
.See ../samples/
directory.
GRAMMAR::=Net{
Alphabet::Sets=given,
|- finite Alphabet.
productions::Sets.
P::=@productions,
terminals::@#Alphabet=given,
T::= terminals,
non_terminals::Finite_sets Name=given,
N::= non_terminals=given,
vocabulary::=T|N.
V::=vocabulary.
source::non_terminals=given,
S::=source,
meta_language::=#(production | comment),
M::=meta_language,
comment::=#character~production.
|- P = img(M)&production.
}=::GRAMMAR.
. Hierarchy of Meta_languages
Different restrictions on the form of productions define different classes of languages. Many of these
have been studied in detail.
Here is a table of some classic grammatical types.
The form of Productions is shown
with capital letters (A,B,C) being used for Nonterminals and lower case letters for
terminal symbols. Greek letters ( \alpha, \beta, ...) are for strings of terminals and nonterminals.
.Table Name Form of Productions Notes
.Row Regular A->a, A->a B Chomski
.Row Context Free A->\alpha Chomski
.Row Conjunctive Grammars A->\alpha & \beta & ...
.Item See
.See [Okhotin03]
and my own
.See ../monograph/03-intersections.html
.Row Boolean Grammars A->\alpha & \beta & ... & \not \gamma & \not \delta
.Item See
.See [Okhotin04b]
.Row Context_sensitive \alpha A \beta->\alpha \gamma \beta Chomski
.Row Context_dependent \beta->\alpha Chomski
.Close.Table
In the above table the notation `x->y` indicates a substitution,
.See ./math_62_Strings.html#Substitution
for a formal definition.
(theory)|- $regular(T)=>>$context_free_language(T) =>> $para_grammar_language=>> $extended_grammar_language.
No useful languages are context free - so more general form of grammars are
needed. However CFGs are the commonest form of grammar used in practice.
They provide useful information in Language Reference Manuals and the
easiest way of designing compilers and interpreters. They are also the
basis of Jackson Structured Programming.
Context_Free_Grammars::=$CFG, below:
CFG::=Net{
Extended Backus-Naur Form is equivalent to a context free grammar(CFG). There exist standard tools for handling context free grammars.
|- GRAMMAR(production=>context_free_production).
context_free_production::syntax=non_terminal ":"":""=" $regular_expression,
regular_expression::=Regular_Expressions(terminal|non_terminal).
Note this is more general than Chomski's definition of a CFG but describes the same family of
languages.
}=::CFG.
. Lexicons and Lexical Analysis
Most languages have a basic vocabulary of strings that are treated as
complete system (a lexeme).
The basic symbols of a language are always defined by a simple context
free grammar. Indeed most lexicons are a regular language.
These are best defined using simple descriptions:
.As_is word :: lexeme, purpose.
or definitions,
.As_is name :: lexeme = string, purpose.
.As_is name :: lexeme = regular_expression, purpose.
.Net
typedef::lexeme, use to introduce type definitions in C/C++/Java/etc.
double_plus::lexeme="++", used to add 1 to a variable in C/C++/Java etc.
all::lexeme="all", universal quantifier.
some::lexeme="some", existential quantifier.
Ada_end::lexeme=ignore_case("end").
.Close.Net
. Para_Grammars
Para_Grammar::=following,
.Net
|-$GRAMMAR(production=>para_production).
para_production::= $non_terminal ":"":""=" para_regular_expression,
para_regular_expression::=(("&" | "|") $parameters|) $item #(("&"|"~") $item),
item::= (|hash)($terminal |$non_terminal | "(" $selection ")" | $parameter),
selection::= $sequence #("|"$sequence),
sequence ::= #($item | $parameterized),
parameterized::= $parameters $item,
parameters::=l_bracket $binding #(comma $binding) r_bracket,
binding::=$parameter #(comma $parameter) ":"$item,
parameter::=variable.
Parameters are used to indicate correspondences across sequences, etc.
Example
.Let
dup::=|[w:#Bits ](w w).
defines a context sensitive language in a very simple form.
.Close.Let
.Close.Net Para_Grammar
. Semantics
The meaning of a grammar depends on how it is used
- lexical or syntactical, ....
Specialized languages can be constructed by using a grammar with different
sets of terminal symbols and by interpretting the structures and tags
differently.
By default you can assume that a sequence is parsed as an array or list (of
type #X for some X). Parameterized items become mappings from parameters
to content(of type X<>->Y). For more, see
.See http://www/dick/maths/notn_2_Structure.html
.See http://www/dick/maths/notn_14_Docn_Semantics.html#Definitions
.See http://www/dick/maths/notn_13_Docn_Syntax.html,
for an example of a how the formal grammar of MATHS maps into
a structured objet.
. Translation
A translation can be defined by giving two corresponding grammars. One
defines the input, and the other the output. Tags indicate corresponding
strings in the two grammars.
More sophisticated is to simultaneously define both the input and the
output at the same time. Here each definition doesn't define a set of
strings, but does define a set of pairs of strings (equivalent therefore to
a relationship between two languages). A simple version of this idea will
be found in Aho and Ullman Volume 1.
. Parsers
A parser constructs a data structure that encodes the input sequence. Each
non_terminal in the grammar will have its own associated type of object in
this data structure. A particular input is encoded as a set of objects
which in generalwill not be a tree but a directed acyclic graph(dag). In
MATHS a simple formal model might be to simultanously define the abstract and
the concrete syntax like this:
(expression, expressions) ::@#Alphabet>