Select this to skip to main content [CSUSB] >> [CNS] >> [Comp Sci Dept] >> [R J Botting] >> [Monograph] >> 03-intersections
[Index] || [Contents] || [Source] || [ Search monograph] || [Notation] || [Copyright] Wed Mar 10 16:05:47 PST 2004

Contents


    Part of Chapter 3

      This is a selection of pieces taken from chapter 3 of the monograph. They are those concerned with the effect of allowing BNF-style definitions of form
    1. term::= expression & expression. (where & indicates the intersection of the two expressions).

      The key point is to show that intersection do allow the definition of some (but not all) context sensitive languages that are not context free, and that these languages have interesting parallel acceptors.

      This is extracted from Chapter 3 of a monograph on software development methods. It uses some notation developed in chapter 2. The notation is described in [ math.lexemes.html ] and [ math.lexicon.html ] plus [ math.syntax.html ] and [ index in math.syntax ]

      Two special symbols: "==>" stands for "subset or equal to" and "=>>" stands for "proper subset of".

      2.1 Introduction

      The theory of computer languages lags behind the practice. In theory one writes a single grammar to define the language, but in practice a compiler has two or more cooperating parts. Multiple passes and/or concurrency are clearly convenient - but why? The theory does not explain the practice. The gap is closing: the Turing language was defined as the intersection of lexical, syntactic and semantic requirements [HoltCordy88]. Section 2.3 below [ 2.3 Dictionaries ] shows that a grammar with intersections in it is handled by a network of parsers.

      <...formal definitions of CFGs ...>

      2.3 Dictionaries

      A dictionary (Chapter 2) permits intersections and complements in definitions. For example consider:
      (example):
      Net
      1. L::=P & Q, -- L is the intersection of P and Q.
      2. P::=#a B, -- P is any number of a followed by a B.
      3. B::=(b B c |),
      4. Q::= A #c,
      5. A::=(a A b | ).

      (End of Net)

      Therefore

    2. (B) |- (L1): B={b^m ! c^m || n:0..},
    3. (P) |- (L2): P={a^n ! b^m ! c^m || n, m:0..}.
    4. (A) |- (L3): A={a^n ! b^n || n:0..},
    5. (Q) |- (L4): Q={a^n ! b^n ! c^m || n, m:0..}.
    6. (L2, L4) |- (L5): L={a^n ! b^n ! c^n || n:0..}. The last result shows that a dictionary with an intersection can describe a language that is not context free.

      2.4 Acceptors

      Define (temporarily) an acceptor as a relation R between strings of symbols that accepts string i iff i R "" .
    7. I::=Input Symbols,
    8. #I=Input strings,
    9. @(#I, #I)::=relations processing input strings.
    10. For R:@(#I,#I), accepted_by(R)::= {x:#I || x R () }.

      The get some input relation, x? is defined by For x:I, x? ::@(#I,#I)= rel [i, i':#i] ( i=x ? i' ), -- compare [LamShankar90].

      It follows that

    11. |-"a" ("a"? ) (),
    12. |-for no x:#I~{"a"} ( x("a"? ) () ),
    13. |-accepted_by(("a"? )) = {"a"}.
    14. |-"b" ("b"? ) (),
    15. |-"ab" ("a"? )"b" ("b"? ) (),
    16. |-"ab" ("a"? ; "b"? ) (),
    17. |-{"ab"} = accepted_by ( "a"?; "b"? ).


    18. |-"aba" ("a"? )"ba" ("b"? )"a" ("a"? ) ().
    19. |-"aba" ("a"? ; "b"? ; "a"? ) ().

      Notice that if R is a finite relational expression containing only (;), (|), (do) and x? then R represents the behavior of a non-deterministic finite state acceptor--(FSA). Hence there is an equivalent deterministic FSA. Similarly in reverse, for each FSA there is a finite relational expression made from x?, (|), (do) and (|).

      It is obvious that the usual top-down parser will take a context free MATHS grammar and construct a program that accepts language. Each definition is a regular expression. There is a simple mapping of the dictionary into a set of definitions of relations that model the parsing of the input by the grammar.

       	Table of Mapping from Dictionary to Program
       	Dictionary	Relation	Notes
       definition 	a::=b	C_a::=B		B is the translation
      of b.
       terminal 	t	t?	 t is removed from front of i
       non_terminal	a	C_a		C_a is a relation that consumes all of  a
       sequence	 a b c...	A; B; C; ...	Where a, b, c...translate to A, B, C, ...
       selection 	(a|b|c)	(A|B|C|...)	Where a, b, c...translate to A, B, C, ...
       iteration 	#a	do(A)		Where a translates to A
      
      For example	P::=( a P b | #c) 	becomes  C_P::=(a?; C_P; b? | do(c?)).
      

      To show

    20. |-P=accepted_by(C_P), take ==> and <== in turn.


    21. |-P ==>accepted_by(C_P).
      Let
      1. i:P, then either i in #c or for some p:P( i=a ! p ! b). In the first case, i (do( c? ))() and so i is acceptable to C_P. In the second case, i=a ! p ! b and so if p is acceptable to C_P then so is i. By induction we conclude that if i in P then i in accepted_by(C_P).

      (Close Let )

    22. |-accepted_by(C_P)==> P.
      Let
      1. string i be acceptable to C_P. It is therefore either accepted by 'do(c?)' or by (a?; C_P; b?). In the first place, i is in #c and so in P. In the second place i is accepted by C_P iff it starts with an "a" and terminates with a "b" and the middle is accepted by C_P. By a simple induction if p in accepted_by(C_P) then p in P.

      (Close Let )

      Proving that the translation of a grammar into a top-down parser always produce a relation that accepts exactly the language described by the grammar is a straight forward induction on the structure of the expressions. The translation is a homomorphism between the two semi_rings.

      In practice a programmer implementing an accepting relation can reduce the amount of backtracking by adding conditions like the following:

    23. For a in #Character, a.next::=for some y(i=a ! y).
    24. For A in @#Character, A.next::=for some x, y(i=x ! y and x in A).
    25. else::=the previous alternative failed to complete.
    26. For example if the grammar P::=#A B then the mapping of dictionaries to programs above gives:
    27. C_P::=do(C_A); C_B. But if we know that no prefix of a string in B is a prefix of a string in A then:
    28. C_P= while(A.next ) (C_A); C_B, Since |- (An): (A.next; C_A)=C_A. and |- (nAn): (not (A.next); C_B)=C_B.

      A CFL is accepted by a non-deterministic PDA, where a PDA(Push-Down-Automata) has the following structure: <...> It is easy to show that if R is defined by a simple recursion (like a context free grammar) then it can be implemented as a non-deterministic PDA.

      2.5 Acceptors & Dictionaries Next extend the construction of acceptors for grammars to acceptors for dictionaries including intersections. Consider the intersection (&R) of two or more simple acceptors R1, R2, R3, ... . This relation succeeds on those strings that are accepted by all of them.

    29. Consider{ R1, R2, ... : @(#I,#I).
    30. |-accepted_by(R1 & R2 &...) ={x:#i || x (R1 & R2 & ...) () }
    31. ={x:#I || x R1 () and x R2 () and ...}
    32. ={x:#I || x R1 () } & {x:#I || x R2 () } & ...
    33. =accepted_by (R1) & accepted_by( R2) & ...
      }
    The parallel composition of the acceptors for a set of dictionaries accepts the intersection of languages defined by the dictionaries [Hemendinger90] has a similar result for Ada Tasks. Therefore by adding the following mappings we get an acceptor for a MATHS dictionary with intersections and complements:
     	Dictionary	Relation	Notes
     intersection	 a&b&c...	A&B&C&...	Where a,b,c...translate to A,B,C,...
     complement	 a~b	A ~ B		Where a and  b translate to A and B.
    For instance consider:

    Converting dictionary example above into an acceptor with two processes gives:

  1. C_L::=C_P & C_Q
  2. C_P::=do(a?); not a next; C_B,
  3. C_B::=(b?; C_B; c? | else),
  4. C_Q::=C_A; do(c?); not c next,
  5. C_A::=(a?; C_A; b? | else).


  6. |-L={a^n b^n c^n || n:0..}. C_L accepts string i iff both C_P and C_Q relate i to (). Hence C_L accepts P & Q. If C_L accepts a string then the string is in P&Q. Hence C_L accepts a string if an only if it is in L.

    So, dictionaries define languages that are recognized by networks of CFG acceptors{footnote1}. Hemendinger shows that similar rules can be used for deriving grammars that model the communications that can occur between two Ada tasks

    [Hemendinger90]. Notice that it is easy to transform acceptors into automata that report at each step whether or not the input is currently acceptableand then connect these to an "and-gate". <...>

    This kind of structure - a Directed Acyclic Graph(DAG) of simple automata is found in naturally occuring recognisers - eyes, ears, and brains. Gelernter would calls it a trellis. It is known that any finite state automata can be decomposed into such a DAG of simple finite state automata and delay machines - where a simple automata is one that has a "simple group" [HartmenisandStearns]. However, there are mathematically simple automata of arbitrary size and complexity and so this result is not very helpful. Further it does not help us to understand what can be done by networks of recognizers that have stacks in them. This is the next topic.

    <...more on my own metalinguistic MATHS language...>

    A simple context free dictionary has a comparatively simple top-down acceptor. The correspondence has proved its value in practical projects [ DDD in newsubjects ]

    Intersections in the dictionary lead to a network of parallel recognizers. Two questions arise:
    Net

    1. How many parallel processes (intersections) might be required?
    2. What languages can not be handled by n parallel processes(intersections)?

    (End of Net)
    These are the topics for section 3 below. <...>

    3 Grammar Theory

      3.1 Concurrent and communicating grammars

      Concurrent and communicating grammars are important because if a language is described as the intersection of n grammars ( of a given type G say which is accepted by automata of type A ) then the language is accepted concurrently by n automata of the type A . Thus n is an upper bound on the number of communicating sequential processes needed to recognize, parse, or process the language. Further if the automata are in some sense efficient then there will be an efficient (in the same sense) n processor solution. Concurrent definitions imply cooperating sequential processes that form networks that can be efficiently implemented.

      The family of languages defined by these grammars is a superset of the Boolean closure of the family of context free languages and a proper subset of the context sensitive languages.

      Defining a language as the intersection of several grammatical constraints is not new - Constrained Expressions are expressive and based on simple regular expressions [Dillonetal86]. [HehnerSilverberg83] use the term Communicating Grammars for a set of grammars with some terminals and non-terminals occurring in both. The model here is simpler - I propose the terms concurrent context free grammars -- -- --(CCFG)
      for the dictionary model proposed here, and concurrent context free languages-- -- --(CCFLs)
      for the family of languages defined by CCFGs. The Turing project used explicit intersections [Holt & Cordy 88]. Hemendinger notes that Ada tasks have communication patterns that are CCFLs [Hemendinger90]. <formal model...>

      3.3 Intersections of CFLs

      I now describe a hierarchy of families of languages: CFL^0, CFL^1, ....

      Let CFL^0 be the regular languages and CFL^1 the context free languages.

      A language is in CFL^n if it is the intersection of n CFL^1 languages. A similar set of definitions can be created for any family of languages which is not closed under intersection. The context free dimension of a language L -- -- --(CFD)
      is the minimum n for which L is a CFL^n.

      Liu and Weiner defined CFL^k as the k-intersection languages - but failed to specify what the intersection of zero CFLs might mean. In this case by defining the regular languages as CFL^0 implies that if a language is the intersection of a CFL^p and CFL^q language then it is a CFL^(p+q) language.

      Liu and Weiner remark that trivially

    1. if n<m then CFL^n==>CFL^m and show that CFL2<>CFL1 by considering {a^n b^n c^n||n:0..}. Thus CFL1=>>CFL2 As another example consider (example2)
      Net
      1. L::=P & Q,
      2. P::=#a B #d,
      3. B::=(b B c|),
      4. Q::=A D,
      5. A::=(a A b |),
      6. D::=(c D d|).

      (End of Net)

      Clearly,

    2. (example2) |- P={a^n1 ! b^n2 ! c^n3 ! d^n4 || n2=n3},
    3. (example2) |- Q={a^n1 ! b^n2 ! c^n3 ! d^n4 || n1=n2 and n3=n4}.

      Therefore,

    4. (example2) |- L={a^n1 ! b^n2 ! c^n3 ! d^n4 || n1=n2=n3=n4}. Similarly
    5. {a^n ! b^n ! c^n ! d^n ! e^n || n:0..} in CFL^2,
    6. {a^n ! b^n ! c^n ! d^n ! e^n ! f^n || n:0..} in CFL^2,
    7. ...

      Liu & Wiener presented for all p>0 languages which belong in CFL^(p+1) and not CFL^p. This proves that for all p ( CFL^p=>>CFL^(p+1)) [Liu & Wiener 73, Hemendinger 90]..

      For example the language

    8. {a^n1 ! b^n2 ! c^n3 ! a^n1 ! b^n2 ! c^n3 || n1,n2,n3:0..} in CFL^3 ~ CFL^2. It is interesting to note that all the other languages formed by permuting terms are all CFL^1 or CFL^2: {a^n1 ! b^n2 ! c^n3 ! b^n2 ! a^n1 ! c^n3 || n1,n2,n3:0..} in CFL^2 ~ CFL^1. {a^n1 ! b^n2 ! c^n3 ! c^n3 ! b^n2 ! a^n1 || n1,n2,n3:0..} in CFL^1 ~ CFL^0. etc.

      The first language exhibits two examples of what Jackson calls an "interleaving clash", and the next a single clash, the last has no clashes [Jackson75]. I therefore conjecture that the number of interleaved groups is equal to one less than the context free dimension of the data.

      Liu and Weiner [LiuWeiner73] use a correspondence between the context free dimension of a language and the dimensionality of an associated linear space in their proof. They note that determining the context free dimension of an arbitrary language is often difficult. Finding G:CCFG with degree n merely establishes an upper bound on the CF dimension of the language.

      There are well known polynomial time algorithms which recognize or parse a CFL^1 language. Hence CFL^n languages will be recognized by n parallel polynomial time algorithms - and so by a polynomial time algorithm that simulates the n parallel processes. The simple parallel structure proposed to handle intersections has no interaction between the processes. Each process is isolated from all others and conceptually has its own copy of the input. There is an implicit census taker waiting for all processes to accept their inputs before it reports success of the whole. This is an easy structure to implement in most high-level languages and well suited to modern programming styles with tasking, threads, coroutine, and/or objects.

      There exist CSLs that are not CCFL. An example is {a^(n^2) || n:0..} which is a CSL. If it was a CFL^p for some p then it would be the intersection of p CFLs with a single terminal symbols. A CFL with a single non-terminal is regular and so the language must be the intersection of p regular languages. Therefore it is regular. However it is obviously irregular. Therefore

    9. {a^(n^2) || n:0.. } in CSL~CCFL.

      A similar argument shows that if L in CFL^p for some p, and t some terminal then

    10. L&#t in CFL^0=REG.

      CFL^n is accepted by n parallel non-deterministic PDAs. Hence it follows that CFL^n is closed under:
      Net

      1. Star
      2. Concatenation
      3. Inverse finite state transductions.
      4. Inverse homomorphism

      (End of Net)
      Hemendinger uses closure under Inverse homomorphism to construct an elegant proof that a language is in CFL^3~CFL^2 (page 753, [Hemendinger90].

      CFL^p is not closed under union but CCFL is because

    11. (A | B)&(C | D)=A&B | A&D | B&C | B&D

      so that the union of a CFL^p and a CFL^q is a CFL(p*q) when p,q>0.

      If we permit networks with a more general acceptance criteria - a boolean operation on the individuals then the number of PDA's required for the union of two such nets is the sum of the sizes of the two nets.

      Other results depend on an exemplary type of language - languages defined by linear constraints [LiuWiener73]. Given an alphabet of N elements {a1,a2,a3,...,a[N]} and n::Nat0^N=(n1,n2,...n[N]), define a**n=![i:1..n](a[i]^n[i]).

      Given alphabet of N elements {a1,a2,a3,...,a[N]} and an integer linear map M:Int^N -> Int^p and vector j[1..p] then

    12. LL(M,j)={a**n || for some n:Nat0^N(M(n)=j)}.
    13. LinL::=img(LL) are the Linear Languages.
    14. Linear Languages map some languages into linear spaces
    15. If M is of rank r then ML(M,j) in CFL^r.
    16. Given alphabet of N elements A={a1,a2,a3,...,a[N]} define
    17. For a:A, Dup(a)::={a**n! a**n || for some n:Nat^N}
    18. |-Dup(a)==>{w ! w || w:#A}
    19. |-Dup(a) in CFL^N

      A countably infinite union of one set of CFL^m (m=1..) is {x! x || x:#@}. Now for finite p, {x! x || x:#@} is CSL~CFL^p. However

    20. {x!x || x:#@}={x!x || n:0.., x:@^n}= | [n:0..]{x!x || x:@^n} Now if x:@^n then x=0^p1! 1^p2 ! 0^p3 ... 0^p[m] where +(p)=n and m<=n.

      Therefore {x! x || x:@^n} is in CFL^m.

      Clearly then {x! x || x:#@(|x|<=n)} is also CFL^m , and so as m tends to infinity we get a closer and closer approximation to {x! x || x:#@ } which is a Context Sensitive Language(CSL).

      3.4 Embedded intersections

      The union of a CFL^p and a CFL^q is a CFL(p*q) when p,q>0. So if a dictionary contains an intersection which is not self embedding then there is a dictionary with that intersection at the top level.

      Philippe Schnoebelen{footnote2} circulated the Theory-Net on the BITNet(comp.theory on Usenet) with an interesting result equivalent to the observation that

    21. For A:Alphabet, L1,L2:CFL^0 & @#A,
    22. X::=L1 | (L2 & (A X)) defines the language
    23. X= | [n:0.. ] (L2 & (A L2) & (A^2 L2) & ( A^3 L2) & ... & ( A^n L1)) and shows that this is a regular set.

      It also appears there is no additional power added to language definitions by allowing intersections to be used arbitrarily - it is enough that they be permitted at the top level.

      Lower level intersections in a grammar (which may be useful and convenient) are equivalent to allowing the network of PDAs to spawn new PDAs. We can get the same effect by keeping PDAs in a do-nothing loop until needed[cf pp81-88 of [Baetan90] by J A Bergstra]. It is clear that there is no limit to the number new PDAs. So allowing lower level intersections implies using either a growing network or a fixed network of finite but unbounded size.

      3.5 Multiple Grammars

      Unlimited networks are a simple extension to CCFL which allows definitions to have parameters in them. Details will be explored later but there are two extant examples of systems based on this model - Sakakibara's Extended Simple Formal Systems --(ESFS) and JSD.

      Sakakibara's Extended Simple Formal Systems(ESFS) define a family of languages that lie between the CCFL and CSL [Sakakibara90], p91. Sakakibara also presents a Simple Formal System(SFS) which defines the language `{ a^(2^n) || n:Nat0}` [Sakakibara91]. This is clearly not definable in terms of a finite number of intersecting grammars. Using more advanced MATHS one of Sakakibara's SFSs would be written as Net{ P::#{a}->@, P(a), for all x:#{a}, if P(x) then P(x!x). }. Moreover, let

    24. sqr::#{a}->#{a}=(_)!(_) then by definition
    25. |-For A:@#{a}, sqr(A)= {x!x || x:A } and so this non-Chomskian grammar
    26. L::= a | sqr(L) defines L={ a^(2^n) || n:Nat0} by the usual iterative/topological model.

      The Simple Formal Systems [ defined Sakakibara 91 p84] can define a language that is not CCFL, and yet can not define {a^n ! b^n ! c^n || n:Nat} which is CCFL [evidence in proof of proposition 7 on p90 of Sakakibara 91].

       	REG=CFL^0=>>CFL=CFL^1=>>CFL^N=>>CCFL=>>ESFS=>>CSL
       	            CFL       =>>  SFS      =>>ESFS

      JSD -- -- --(Jackson System Development)
      designs systems as a large network of objects that belong in a small number of "classes" [Jackson78]. Each class has many similar but out of phase objects . In these designs, each dynamic object keeps an occurrence of an entity up to date [ See bibliography of JSP/JSD, [ JSP in newsubjects ] [ JSD in newsubjects ] Chapter 9]. It appears likely that these correspond to a subset of the CSLs.

    . . . . . . . . . ( end of section 3 Grammar Theory) <<Contents | Index>>

    Missing stuff

  7. ...

    Footnotes


    (footnote1): Psychologists use a network of "daemons" like this for modelling perceptual processes [ Human Information Processing, Lindsay & Norman, Pub Academic Press, 1970].


    (footnote2): [ EMail <mcsun!inria!imag!lifia!phs@UUNET.UU.NET>, 21 Oct 89 18:01:47 GMT]

. . . . . . . . . ( end of section Part of Chapter 3) <<Contents | Index>>


Formulae and Definitions in Alphabetical Order