[CSUSB] >> [CompSci] >> /u/faculty/dick/cs320/sebesta/03

CS320 Notes on Syntax and Semantics


Definitions

	Syntax
	   Defines what is legal and so meaningful in a language.
		HTML: An HTML document starts with an optional header and 
			then a body.

		LISP:  A list is an atom, nil, or a sequence of lists inside
			parentheses.
		       An expression is an atom or list that has an operator
				in the first item.

		C++:   1+2 is a valid expression.

          Syntax also

	      Describes the parsing of a string of symbols into a 
			data structure.
			Normally a tree: An X has a Y, a Z and a W.
		HTML:  The document has a body part and a header part.
			Document
			/	\
		     Head      Body

		LISP:  Each list becomes a binary tree.
		       Each expression is a variable or a constant or it 
			has an operator and some operands.

		C++:   1+2 is a valid expression. It has an operator +
			and two operands: 1 and 2.

	The structure uncovered by parsing input can be shown in a diagram
		using the UML.

	Semantics
		Associates a meaning with structures in the language
		HTML:
		<IMG SRC="me.gif"> means "let the user see this graphic here"

		LISP:  
		(+ 1 2) means `the result of adding 1 and 2`.
		(A 1 2) means `call function A with arguments 1 and 2`.
		(+ 1 A) means `the result of adding 1 and the value of A`.

		C++:
		1+2 means `the result of adding constant 1 to constant 2`

Syntax

	Lexemes
			A language is a set of strings of lexemes.
			LISP: atoms, "(", ")", ","
			HTML: tags "<"...">", symbols ("&"...";") and text.
			C++: 123, alph, const, =, >>, ++, <=, ...
	Token
			Internal struct-ure encoding a lexeme
			LISP: struct atom{ int length, char chars[],...}

	Recognition vs Generation
			Equal and Opposite!
		Recognition
			string->(recognize)->{yes, no}

			Given "main(){exit(0);}"

			Goal: Produces YES when valid C else NO.

		LISP:  Is this a valid list: (A B C ( X () ?
		HTML:  Is this valid: <HTML><head></head><body></body>

	Generation  
			(generator)->Language
			Output all valid C programs!
			A theory of languages.  
			Describes what might be written.
	Parsing
			string->(parser)->Tree Structure
			Part of a compiler...
		LISP: (A B C)->(parse)->  .
					 / \
					A   .
					   / \
					  B   .
					     / \
					    C   NIL

FORMAL SYNTAX


	Chomski	- had thought out a theory of REAL languages:
		Context Free Grammars.

    ALGOL 60, Zurich.
	Backus - "Lets use CFGs..." (Having suffered problems with FORTRAN!)
	Naur   - "Here is how to use it..."
	Committee- "Excellent...."

	Algol60 report
		http://www.csci.csusb.edu/dick/samples/algol60.syntax.html

	Metalanguage
		BNF used ::=   where Sebesta uses ->
		We will use ::= NOT an arrow

	Deriving = generating and noticing how we did it.
	Parsing = recognize and output structure.

	Later language definitions have abandoned the <...>
	and allow (....), [....], and {....} as metacharacters.
	Each of these is called an EBNF (Extended BNF).

Our Notation XBNF

Why not use the books notation?
   Can you figure out the following? (I can't)
     	<assign>-><var>:=<expression>
	<program>->{<stmt_list>}
	<comparison>-><simple_expression><relation><simple_expression>
	<relation>-><|>|<=|>=|==|!=
	<logical_op>->|| | &&
Sebesta's notation is AMBIGUOUS in ASCII.  Which is why our "XBNF"
does not use "<" and ">".

Now we don't have this problem in C++:
	cout << "statement";
	cout << statement;
	cout << "<<";
	cout << 1+1;
XBNF copies C++ and use C++ strings to indicate lexemes/terminal symbols.
XBNF puts lexemes in quotes.

XBNF versions of Sebesta's examples:
     	if_statement::="if" logic_expression "then" statement
		| "if" logic_expression "then" statement "else" statement.

	identifier_list::= identifier | identifier "," identifier_list.

	http://www.csci.csusb.edu/dick/maths/intro_ebnf.html

XBNF has some neat extras inspired by EBNF (see later):
	O(X)::=`X is optional`.
	N(X)::=`one or more X's`.
	#(X)::=`zero or more X's`.
	L(X)::=`lists of X's`.
	L(X,Y)::=`lists of X's separated by Y's`.
   Axioms:
	O(X)= X|empty.
	N(X)= X | X N(X).
	#(X)=O(N(X)).
	L(X)= X | X "," L(X).
	L(X,Y)= X | X Y L(X, Y).

   Example 3.1 in XBNF
	program ::= "begin" statement_list "end",
	statement_list ::= L(statement, ";"),
	statement::=variable":="expression,
	variable::="A" | "B" | "C",
	expression::= variable "+" variable | variable "-" variable | variable.

   Notice - no <>, ->
   Notice - defined terms are spelled out in full.
   Notice - Short hand for lists
   	statement_list = statement #(";" statement),
   	because in XBNF
	  For all x,y, L(x,y) = x #(y x).

  On the WWW we use the HyperText Markup Language or HTML:
	HTML:  HTML_file::= header body.
		header::= "<head><title>"document_description"</title></head>".
		body::="<body>"HTML_text"</body>"
		...
		preformated_text::="<pre>"HTML_text"</pre>.

  In the LISP lab you'll use this info:
	LISP:  list::= atom | null | "(" #list ")".
		atom::= atom_char #(atom_char).
		atom_char::= letter | digit.

Ambiguity

	Example 3.2  - Translate...into XBNF

	Parse Tree
	Recognizers
	Ambiguity

	Operator Precedence
	   Associativity

EBNF

	[]	Option
	{}	Zero or more of
	::=	is or can be

OUR XBNF - Uses {}[]<> as in math...

	Ada EBNF	Our XBNF
	[ x ]		O( x )		optional x = (x |)
	{ x }		#( x )		zero or more x
	 		{x,y,z,...}   set of x,y,z,...

	XBNF
	[ x ]		subscript x
	char..char	range of characters,...
	A & B		Satisfy BOTH A and B at once
	A ~ B		Satisfy A BUT NOT B

   Example - Lisp Lists:
	list ::=atom | "NIL"  | "(" #list ")",
	atom ::= number | non_numeric | string,
	number::= digit #digit,
	 digit ::="0".."9",
	...
	
   Example - HTML document
	html_document::= "<HTML>" head body.
	head::= O( "<head>" O( "<title>" text "</title>" ) "</head>" ).
	body::= "<body>" # piece_of_html_text "</body>".
	piece_of_html_text ::= heading | image | separator | anchor | ... .
	heading ::= "<h1>" text "</h1>" | "<h2>" text "</h2>" | ... .
	image ::= "<img src="filename" alt="text" ">".
	separator::= "<br>" | "<p>" | "<hr>".
	anchor ::= "<a " (hypertext_ref | named_link) ">" text "</a>".
	hypertext_ref::= "href=" URL.
	named_link::="name="text.

  C--:
	program::="main(){" #statement "}",
	statement::=assignment | declaration,
	declaration::= "int" identifier";"
	assignment::=identifier"="expression";",
	identifier::="a".."z",
	expressions::=term #("+" term),
	term::=factor #( "*" factor),
	factor::="(" expression ")" | identifier | integer,
	integer::= N(digit).

   Example - LISP expressions
	expression::= variable | constant | "(" function # argument ")",
	argument::=expression,
	variable::=atom,
	constant::=string | number | "NIL" | "(" "QUOTE" list ")".


Syntax Graphs

	Good for users and beginners.
	Designers need the structure of an EBNF meta-language.

	Syntax graphs are NOT USED in CS320
		Not in final, optional in assigned work and projects.

Attribute Grammars

	Skip in CS320

Dynamic Semantics

	We will be using UML in CS320.
	This material is covered extensively in CS620.
	Skip this section.

Revision Exercises

	Do not do exercises 10, 11, 12,.... they are on attribute grammars
		and semantics.
	Not on the final.


Note.
	x:=y	-- assignment in Algol 60, Algol 68, Pascal, Ada,
		and most computer science text books.
		C/C++ use 'x=y;' instead.
	x=y	-- equality relation Algols, Pascal, Ada, Mathematics,
		and most CSci text books.
		C/C++ uses 'x==y".

	x::=y	-- definition in BNF, EBNF, XBNF.

Silly Examples

	Train::=Loco #Wagon Caboose.
	Meal::=#Bite.
	Drink::=(Big|Small) Gulp.
	RAM::=#byte & #word

Silly Exercise

	In the old days, in the Cajon pass, there were two kinds of
	train. A train either had one or two engines. The two engined
	trains had to be switched to one line and the one engined
	trains switch to go over the pass.  A one engined train
	stated with a loco, then had zero or more wagons, and then
	a caboose.  A two engined train started with a loco, had
	zero or more wagons, and ended with a caboose and another
	loco.

	Define a train. Use phrases in the above to make nonterminal
	The terminals are loco, wagon, caboose.

Serious Exercise

	Work out the XBNF syntax of a C function declaration.
	Hint: It is quite complex and there are several possible
		answers.  THINK!

	Answer... search in
	http://www.csci.csusb.edu/dick/samples/c.syntax.html

Between Syntax and Semantics comes Structure.

	When describing the meaning (semantics) of a piece of a language it
	helps in a real language to distinguish between the actual concrete
	syntax (do you write "begin" or "{"...?) form the abstract syntax 
	or structures that are involved (a block...)

Semantics

	Informal semantics
		Incomplete
		Inconsistent
		incomprehensible

		Example - Algol 68, LISP, ...

		But popular.

	Three classical approaches to Formal Semantics
		Operational
		Axiomatic
		Denotational

	We will use the UML to express semantic structures.
	The operations that define operational semantics can be put
		onto these diagrams.


You don't need to go into the details unless you want to!

	See notes below if you want to.

Exercise

Chapter 3 of Sebesta has lots of new words - shown in bold case.  Several 
of these have been defined in a collection of files in 
	http://www.csci.csusb.edu/dick/samples/ 
On the WWW you can look up these definitions using the search commands
embedded in
	http://www.csci.csusb.edu/cs320/index.html#searches

Assigned work:

	Do not do review questions 9, 10, 11, and 12; or problems 16,17,18.
	Do not draw any syntax graphs!
	Where the book says C you can use C++
	If you don't know Pascal look in
		http://www.csci.csusb.edu/dick/samples/pascal.syntax.html

	Hints:
		http://www.csci.csusb.edu/cs320/sebesta/03.answers

This chapter introduces many new terms.  If you can not find their
meaning in the text then follow this link
	http://www.csci.csusb.edu/cgi-bin/dick/lookup320
and enter the word into the form to search my repository of terms
for CS320.

Semantics (Optional)


	Operational Semantics
		Language parts are mapped into OPERATIONS on a virtual machine.
			If well chosen it makes implementation trouble free.
				JAVA
				BCPL

		VDL
			Became Vienna Design Method(VDM)
				for ALL Software engineering!

	Axiomatic
		Define what can be proved to be true as statements are obeyed.

		For example if you know that i is 1 before i++; then
		you know what it is afterwards!

		Hoare Triples
			Given P is true before S then we must have Q afterwards.

			Specifies what the statements do
			without stating a particular implementation.

			The "P" implies a disclaimer that if P is not true
			this triple doesn't say what happens.

		Dijkstra's Calculus
			Given that you want Q to be true after this statement
			then the language merely requires you to set up P first.

		Precondition: P
		Postcondition: Q

		To see how assertions can be imbedded in code
		see examples of Quick sort in Java at:
			http://csci.csusb.edu/public/faculty/dick/qsort0.java

		Later version:  Do both Pre and Post as a single predicate.
			Done By Eric Hehner
			Use:  x' for new value of x.
			 "x:=e" is x'=value(e) and for all other y, y'=y

	Denotational Semantics
		Associates parts of a language with objects in an
		abstract mathematical 'domain'.
			Actually a "function space" that is also a 
				"complete Partially Ordered Set"
	BOOK
		N[[ Part ]] = Denotation of Part.
	This Course.
		XBNF includes the ability to define functions.
		Symbol for set of functions mapping Givens to Goals:
			Givens->Goals
		Example:  cos: Real->Real.
		So we can define of functions
			 that map syntax into semantics.
			For x:syntactic_form, meaning(x)::=expression.

	For real languages we do it as a series of steps:
	(lexical): lexer::=strings->tokens.
	(syntax): parser::=tokens->structures.
	(semantics): denotations::= structures->meanings.

Binary numbers -syntax and semantics

	Translated into Our XBNF:
		binary_digit::="0"|"1". -- note these are characters.
		binary_number::= binary_digit | binary_number binary_digit.

		meanings::=`0,1,2,3,...`::=Positive&Integer.

		We now describe a mapping or function from character
		strings defined above into the positive integers
			0,1,2,3,...
		M::denotation=`Meaning of (_)`.

		A denotation maps valid syntactic strings into meanings
			denotation::= #char->0.. .
		M maps each #char into a unique integer,
	
                M must obey the following carefully chosen axioms:
			M("0")=0,
			M("1")=1.
			For b:binary_number, M(b "0")=2*M(b),
			For b:binary_number, M(b "1")=1+2*M(b).

	Notice: a binary_number is either like this: number digit or
		it is a digit.  So one of the above rules will apply
		to all valid strings.

		Thus
			M("1101")=1+2*M("110")
				 =1+2*2*M("11")
				 =1+2*2*(1+2*M("1"))
				 =1+4*(1+2*1)=13.

	Doing this for a full scale language is not easy...
	You need a notation for handling mathematical functions.
	For instance 
		http://www.csci.csusb.edu/dick/maths/intro_function.html

Attribute Grammars -- Not in this course

	Idea - each piece of input has some values (attributes) attached to it.
		Some types of value go down the parse tree (inherited)
		Some values are computed and passed up the tree
				(synthesized)