[CSUSB] >> [CNS] >> [Comp Sci] >> [R J Botting] >> 03
[Index] || [Blog] || [Research] || [Teaching] || [Search] Mon Apr 12 06:44:30 PDT 2004 >> [CS620 Course Materials] >> 03
[Index] || [Contents] Mon Apr 12 06:44:30 PDT 2004

# CSCI620 Session 3

## Previous

[ 02.html ]

## Preparation

Study Chapter 2 of the text and prepare some notes and at least one question you need to ask on it.

## Syntax

### Set Notation

XBNF combines BNF with an ASCII form of Set Notation

Brief summary
XBNF\TeXSet Operation
|\cupunion
&\capintersection
~-complement
><?Cartesian product
in\epsilonmembership
<==?containment
==>\subseteqsubset or equal
<<=?containment
=>>\subsetsubset and not equal
|S||S|cardinality, number of elements in S
{}?the null set
S^nS^na string of n elements each in S
#SS^*a string of any number of elements each in S including none
For example
Net

1. "ab"^0 = "".
2. "ab"^1 = "ab".
3. for n:2.., "ab"^n = "ab" >< ("ab"^(n-1)).
4. #"ab" = "ab"^0 | "ab"^1 | ... | "ab"^n | ... ,
5. ={ "", "ab", "abab", "ababab", "abababab", ...}.

(End of Net)
[ cs320xbnf.htm ]

### Backus-Naur Form

An example: defining the set of binary numbers in terms of the Bits={"0", "1"}.
1. Binary_Number::= "0" | "1" | Binary_number "0" | Binary_number "1".

Notice that I put terminals in quotations and link non-terminals to their definition.

### Extended BNF

When preparing notes I don't usually use the {}[] notation. I like to use {} and [] for other purposes. I tend to write
2. #(T) = zero or more T = {T}.
3. O(T) = optional T = [T].
4. N(T) = one or more T.

### The WHILE Language

Here is my personal definition of the WHILE language compare it with our text book's. Is it more powerful or not?
5. grammar_of_WHILE::=following
Net
1. Num::syntax=Numbers, here binary, but could in decimal..
2. Var::syntax=A set of variables, here letters.
3. Aexp::syntax=The arithmetic expressions giving numbers as values.
4. Bexp::syntax=Boolean expressions, giving true or false as values
5. Stm::syntax=Statements.
6. Num::= "0" | "1" | Num "0" | Num "1".
7. Aexp::= Num | Var | Aexp "*" Aexp | Aexp "+" Aexp | Aexp "-" Aexp.
8. Bexp::= "true" | "false" | Aexp"="Aexp | Aexp"<="Aexp | "not" Bexp | Bexp "and" Bexp.
9. Stm::= Var"="Aexp | "skip" | Stm";" Stm | "if" Bexp "then" Stm "else" Stm | "while" Bexp "do" Stm.

(End of Net grammar_of_WHILE)

### Chomsky Hierarchy

This defines the fundamental limits on what grammars can define and what programs (automata) can recognize. It also tends to appear in MS Oral examinations.... Complete the following table:
TypeLHSRHSLanguages definedLanguage accepted by
0string stringRecursively EnumerableNondeterministic Turing Machine
1shorter_string stringcontext-sensitiveLinear Bounded Turing Machine
2nonterminal string??
3nonterminal term nonterminal | termregular, right linearfinite state machine
3nonterminal nonterminal term | termregular, left linearfinite state machine

### Regular Expressions

By the way Regular Languages (definable by Type 3 grammars and accepted by Finite State Acceptors) are also those that can be expressed by simple Regular Expressions:
Net
1. regular_expression::= constant | regular_expression "|" regular_expression | regular_expression regular_expression | regular_expression "*".

(End of Net)
These turn up in many programming and scripting languages including Linux and UNIX, Java, PHP, etc. etc. See [ egrep.html ] and [ regular_expressions.html ] for more details on a useful spin off from the theory.

### Ambiguity

Theoretical books at the graduate level often have ambiguous grammars because they define:
Net
1. S::= A | B | C | S ";" S.
2. A::= a |...
3. B::= b | ...
4. C::=c | ...

(End of Net)
This leads to two ways of parsing "a;b;a".

We can remove this easily by adding an extra production/definition:
Net

1. S::= S1 | S1";" S.
2. S1::= A | B | C.
3. A::= ...

(End of Net)

Because it is easy to remove..... most texts do not worry about it.

### Terminators and Separators

Notice that the semicolon in programming languages is used in two very different ways. In the C/C++/Java family the semicolon is an essential terminating part of a statement that contains an expression:
6. expression_statement::= expression ";".
7. sequence::= statement | statement sequence.
 		x = ++x + x++;

However in the Algol/Pascal/Ada tradition the semicolon is not part of a statement but is put between two statements:

8. sequence::= statement | statement ";" sequence.

Both approaches have advantages. It is worth learning to switch between the modes.

. . . . . . . . . ( end of section Syntax) <<Contents | Index>>

## Questions

### Can you give examples of the two fundamental approaches to syntax analysis page 26

They are called bottom-up and top-down. In bottom-up syntax analysis you try to match the input string with terminals in the grammar and work back to the non-terminals upward to the definition of the whole. For binary numbers one might reason:
1. 1011
2. B001
3. B01
4. B1
5. B

where B is short for Binary_number.

In top down we assume that the whole fits the Binary_number and try to match that against the alternative forms:

1. B
2. B1
3. B11
4. B011
5. 1011

There will be more on this in chapter 4.

### Explain Indirect Recursion

Here is an example
Net
1. A::= C "c".
2. C::= "b" B.
3. B::= "a" | "z" A "z".

(End of Net)
A is defined in terms of C, C in terms of B, and B uses A. So the definition of A in terms of A via C and B. An example string is: "bzbaczc".

### What is the difference between a Lexical Unit and a Vocabulary

A Vocabulary is a collection of lexical units. A lexical unit is a member of the vocabulary.

### Do the Chomsky Types form a hierarchy.

Yes. All type 3 grammars are Type 2, all Type 2's are Type 1's and all type 1's are also type 0.

Further, they form a strict hierarchy: there are languages that can be defined by a type 0 grammar but not by a type 1, and languages of type 1 that can not be defined using a type 2, and so on.

A key moment in Computer Science is the discovery by Floyd that although Algol 60 was defined using BNF it could not be completely be defined by a context free grammar like BNF. The problem is that we need to express type checking rules so that we are not allowed to write silly statements like multiplying a string by an amount of money. This started the search for an equally way of expressing the restrictions on the BNF rules. These came to be called static semantics.

### Why WHILE

We need a simple example of a programming language for use in text books and other publications. WHILE is the simplest structured language that has a loop in it. Simpler means easier examples. The ability to express loops is needed to get a language that is powerful enough to be interesting.

NO! The expression "(a+c)/d$" contain two parentheses "()" and the parentheses are not in the grammar. So, the given grammar can not fit the given string. I think that there are missing definitions: 1. T::= F T1 | "(" B. 2. B::= E ")". ### Why is the Bohm Proof that Gotos are not needed important To be precise it is the Bohm & Jaccopini Proof. It is important because at about the time that it was published people had observed that programs that had lots of Goto statements where hard to understand. Edsger Dijkstra's letter "Gotos considered harmful" notes that he found student work harder to understand when his students used Gotos. In other words: just when we we getting the feeling that the Goto statement was not a good idea, Bohm and Jaccopini proved that we didn't need it. And so the structured paradigm arrived.... with about 200 papers on removing Gotos being published in the next year or so! There is no truth to Donald Knuth's joke that Professor Goto of Tokyo University was very angry when he saw all the papers on this topic. ### Can you explain the rules about negation on page 21 The book has the equivalent of: 3. Integers::=following, Net 1. D::= "0" .. "9". 2. I::= I D | "-" I D | D | "-" D. (End of Net) The more I think about it the less I like it. Consider this derivation: 4. I => - I D => - - I D => - - D D => - - 1 2. I'm not sure that this is what the author wanted. In practice (not theory) it is worth adding extra definitions and given good names to the nonterminals in a grammar: 5. My_Integers::=following Net 1. integer::= "-" unsigned_integer | unsigned_integer. 2. unsigned_integer::= digit | unsigned_integer digit. 3. digit::= "0" .. "9". (End of Net) ### Why does the program in question 6 (page 32) reverse the assignment of t and a The code is 1. ... 2. t:=a; 3. .... 4. a:=t; 5. .... Here variable "t" is short for "temporary" and holds the value of "a" while other things are done to it. After wards the second assignment resets "a" back to it's old value. This is what happens when you simulate an "if...then...else...fi" using "while". ### What does "od" mean in the WHILE language "od" is the reverse of "do". This is a convention invented by the Algol 68 team that terminated each "do" with an "od" and each "if" by a "fi". The alternatives include the Ada convention of "do"...."end do" and "if"...."end if", or the C/C++/Java use of "{" and "}". Also compare with the use of start and end tags in XML/HTML/SGML:  <body> ... </body>  <ol> .... </ol> and so on. ### Does the Java code in question 5 on page 32 work No. There are semicolons missing. See [ Terminators and Separators ] above. I'm also a suspicious of the "write" function! I think this piece of code started out in Pascal.... . . . . . . . . . ( end of section Questions) <<Contents | Index>> ## Exercises 1. If V has |V| elements, and n is a number, how many elements has V^0|V^1|V^2|...|V^n. 2. What does direct left recursion look like? 3. A grammar for signed integers I.... make it unambiguous. 4. Derivation tree for (a+b)/d$.
5. Translate Java to WHILE
6. Remove ambiguity from...
7. Construct a grammar to fit...

. . . . . . . . . ( end of section Exercises) <<Contents | Index>>

## Laboratory 3

I plan to point you at web sites that are full of the syntax (BNF) of languages. I think I will ask each person to explore alone, and then report to the group (one sentence, one URL) at the end of the lab session what they have found.

Possible start: [ http://www.csci.csusb.edu/dick/samples/ ]

## Next

Study chapter 3 on semantics pages 34 to 47 inclusive. [ 04.html ]

. . . . . . . . . ( end of section CSCI620 Session 3) <<Contents | Index>>

# Glossary

1. BNF::="Backus-Naur Form", for syntax and grammar, developed by Backus and Naur.
2. EBNF::="Extended " BNF.
3. HTML::= "HyperText Markup Language", used on the WWW.
4. HTML_page::syntax= "<HTML>" head body.
5. Java::="An " OO " Language from Sun".
6. LISP::= "LISt Processing Language".
7. LRM::="Language Reference Manual".
8. OO::="Object-Oriented".
9. Prolog::="Programming in Logic".
10. TBA::="To Be Announced".
11. UML::="Unified Modeling Language".
12. URL::=Universal_Resource_Locator,
13. Universal_Resource_Locator::syntax= protocol ":" location, where
Net
1. protocol::= "http" | "ftp" | "mailto" | ... ,
2. location::= O( "//" host) O(pathname).

(End of Net)
14. WWW::= See http://www.csci.csusb.edu/dick/cs620/, index to web site for this class.
15. XBNF::="eXtreme" BNF, developed by the teacher from EBNF, designed to ASCII input of suntax, semantics, and other formal specifications.