. Form of a grammar
A grammar is a collection of rules or definitions that describe
how to construct valid or correct strings in a language. The
symbols that make up the strings in the language - characters, words,
or lexemes, are called the terminal symbols. The set of terminal
symbols is symbolized as $T,
T::Sets=`given set of terminal symbols`.
The terms used to talk about the language - forming its meta-language -
are separate set of symbols(N), that does not overlap with $T,
N::Sets=`given set of nonterminal symbols`
|- (disjoint): $T & $N = {}.
A grammer generates the strings in its language by replace non-terminal symbols
by terminal ones. This is why we use the tem terminal vs non-terminal.
The grammar has a vocabulary V made up of symbols in either $T or $N.
V::= $T | V.
The vocabulary $V is split into the
terminals or elements ($T) and the defined terms or nonterminals ($N).
The rules describe a collection of operations or functions that
take a string of symbols from both $N and $T and changes it to another
one. The rules only apply when the string has at least one non-terminal
and typically replace it by a string of terminals and nonterminals.
.Key Context free grammars (
.Key CFG
) have rules are expressed as a definition of
a single nonterminal as a string of terminals and nonterminals:
.Key example
.As_is L::=a L b.
for example. The mapping
substitutes the nonterminal by the string. For
example the above rule describes a mapping/function/operation
that can do things like the following:
.Table Before After
.Row L a L b
.Row a L b a a L b b
.Row a a L b b a a a L b b b
.Row ... ...
.Close.Table
A MATHS
grammar gives each term `n` in $N, a formula that defines `n` in terms of
the nonterminals m1,m2,... (which may include n itself) and some
terminals(D[n](m1,m2,...)), using the following operations "(&|~#)".
In the above $example defines "L" in terms of "a", "b", and
"L". So the L-rule is to replace a string x by "a" x "b". Thus:
For all x, D["L"](x) = "a" x "b".
or
D["L"] = \lambda[x:V]( "a" x "b").
Notice that this implies that `D["L"]` is automatically defined on sets of strings:
D["L"] = \lambda[A:@V]( { y:@V || for some x:A (y = "a" x "b")}).
. Meaning of a grammar
Each terminal and nonterminal represents a set of strings of symbols
in an alphabet or vocabulary.
There are several models of strings and some have been translated into
MATHS
.See http://www/dick/maths/intro_strings.html
.See http://www/dick/maths/logic_6_Numbers..Strings.html
.See http://www/dick/maths/math_61_String_Theories.html
.See http://www/dick/maths/math_62_Strings.html
.See http://www/dick/maths/math_66_SuperStrings.html
But for now I assume in the theory of grammars that strings are
generated by starting with an empty string ("") and a putting symbols
(in $T) onto it using concatenation operation (!).
The pre-defined operations of union('|'), intersection ('&'), complementation
('~'), concatenation, and iteration (#) on sets of strings:
I treat a string as a set when the context requires it. A string s
is cast into the singleton set {s} when necessary.
For s:string, s::sets of strings = { s }.
For A,B: sets of strings,
A B ={c || c=a!b and a in A and b in B},
A & B ={c || c in A and c in B},
A | B ={c || c in A or c in B},
A ~ B ={c || c in A and not( c in B) }.
.Note { x || P} is short for `the set of x such that P is true`.
.See http://www/dick/matsh/intro_sets.html
.See http://www/dick/matsh/logic_30_Sets.html
.See http://www/dick/matsh/logic_32_Set_Theory.html
The Kleene closure operator (#) can be read "zero or more of" and can be
defined as the smallest set of strings that contains the null string and
the results of concatentaing one of the elements of the set onto the closure:
#A =least{ X || X=({""} | A X) }.
'#' could be defined many other ways. In MATHS it is a standard
operator.
A set of definitions associates a meaning (M(n)) with each defined term
(n) by the rule that each M(n) is the smallest set of strings such that
all the definitions are true simultaneously:
For all n:$N, M(n) = D[n](M(m1),M(m2),M(m3),...),
For example, the grammar
Net{ L:="" || a L b }
defines L to be the smallest set of strings such that
M(L)={""}| {c || for some x in M(L) (c =a!x!b)}
This definition does not tell us how to find the M's, neither does it
prove that they exist nor whether they are unique. Finally the term
'smallest' has not been defined. It can be shown, for a wide class of
grammars (see later) that we can get as complete a listing of the M's as
we like by a simple process:
.Box
Initially, for n:$N do M(n)={}.
Iteration the following steps as often as you want:
.Net
For n:$N do P(n) := D[n](M(n1),M(n2),M(n3),...).
For n:$N do M(n)=P(n)
.Close.Net
.Close.Box
For example, with
{ L:=""| a L b }
gives as successive approximations to M(L) as the following sets:
.Net
{}
{""}|a{}b = {""}
{""}|a{""}b = {"","ab"}
{""}|a{"","ab"}b = {"","ab","aabb"}
{""}|a{"","ab","aabb"}b = {"","ab","aabb","aaabbb"}
{""}|a{"","ab","aabb","aaabbb"}b =
{"","ab","aabb","aaabbb","aaaabbb"}
...
.Close.Net
It seems intuitively obvious that these sets are `tending towards a
limit`:
M(L)={a^n!b^n||n:0..}.
The reason that these seem to converge to M(L) is that we have to
look at longer and longer examples of strings in L to find one that has
not been generated by the iteration. The discrepancy
between the iterates and M(L) are getting longer and longer - More formally,
if we itterate long enough, then the iterates will match the limit up to any
preselected length.