[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] /[CS320 Course Materials] /04.html [04.txt(Text)] [Search ]
Tue Apr 9 16:34:37 PDT 2013
[Schedule] [Syllabi] [Text] [Labs] [Projects] [Resources] [Grading] [Contact]
Notes: [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

Contents


    CS320/04 Syntax & Parsing -- Parts of Chapters 3 and 4


      Table
      Previous 03 Evolution of Main LanguagesChapter 2lab03 HTML lab pages
      04 Syntax: grammars, EBNF, parsingChapter 3 sections 1 to 3 + Chapter 4 sections 1 to 3 + XBNF & LRM Handoutslab04 BNF on the web
      Next 05 Semantics: UML UML handoutlab05 UML + Graphics

      (Close Table)

      What is important in chapters 3 and 4

      In chapter 3 we are most interested in the simplest methods of syntax description -- the Context Free Grammars and the BNFs. You can use syntax description when developing any program with complex input or output! In chapter 4 you need know something about how to input and parse data. All programs have to do this! However, the details are a part of the CSUSB CSCI Compilers classes.

      Notice that the lexical analyser (lexer) and the parser together convert the input stream of data into a tree structure -- a collection of objects linked together by pointers. When we draw UML diagrams of the semantics of a language we also describing these objects and how they will be associated.

      State transition diagrams (figure 4.1) are an important part of computer science, but very easy to understand and use. These diagrams express the simplest kind of theoretical computer: the finite state machine or FSA. These are machines with limited memory (finite number of states) and star in several upper division electives in our degrees. The Unified Modeling Language (UML) has a diagram called a UML state machine. Here [ lexer.gif ] is an older version of Figure 4.1 drawn using an UML state machine. Can you figure out what is missing in my UML version when compared to Figure 4.1? More on the UML later.

      By the way, XML documents have their detailed syntax described in a form of Extended BNF.

      Assigned work due

      (1) Study chapter 3, sections 3.1(intro), 3.2(problem), and 3.3(solution). Study chapter 4, sections 4.1(intro), 4.2(lexers), and 4.3(parsers). However I do not expect you to memorize or duplicate the code in chapter 4.

      (2) You will use XBNF(eXtreme BNF) in your project work. Read the XBNF handout. You will be using a subset of HyperText Markup Language(HTML) in the Labs. Read the handout describing a subset of the HTML. Your project will be a longer document mixing XBNF, English, and diagrams, like these two handouts. Bring copies of today's handouts on XBNF and HTML to all future classes, and laboratories. Add your own notes and bring them to the final.

      (4) Study the first 8 Review Questions at the end of chapter 3 and the first 8 questions at the end of chapter 4.

      (4) Hand in written/typed/emailed copies of 2 Review Questions (any mix of chapters) at start of next class ( 2 points total max). Plus 1 point for being prompt and 1 point for full participation.

      Your Project

      In your project you will be developing a LRM (Language Reference Manual) for a programming language that you will be inventing. What makes a good LRM? Four things: Examples show individual points in the set of possible languages and help a person interpolate (guess) other examples. Here is an example of a C expression
       		a+1
      Comments help the user understand by being informal:
       		an expression like `foo+bar` adds `foo` to `bar`.
      A grammar defines the syntax. Syntax expresses precisely the possible forms of something. Grammars are written in a mathematical metalanguage:
       		additive_expression::= term #(("+" | "-") term).
      The Semantics eliminates meaningless possibilities and provides meaning to the rest. These are normally expressed informally but in this class we will use the Unified Modeling Language. -- -- --(UML)

      More on the UML later [ 05.html ] and here are some sample LRMs for well known languages [ ../samples/algol60.syntax.html ] [ ../samples/php_intro.html ] [ ../samples/python.html ] plus the syntax from other LRMs: [ ../samples/cobol.syntax.html ] [ ../samples/pascal.syntax.html ] etc.... Also see [ ../samples/ ]

      Further, these four things (comments, examples, syntax, and semantics) need to be intermingled: you should be able to see the reason for a feature(comment), plus an example or two of the feature, and the rules (syntax and semantics) that apply to that feature all on the same page. We will use the UML to also provide a visual summary of the semantics of the language.

      So for each part of the language that you are defining you will need:

      1. Some commentary
      2. Some examples
      3. Some syntax definitions using some kind of BNF
      4. Some semantics....mainly English but sometimes a piece of the UML and/or math helps.

      Here is the kind of thing I mean.

      Mini-Example: LRM for Decimal Numbers


        (comment): Numbers are expressed using the familiar decimal notation. A string of decimal digits represents a number. The value of the number is calculated in the normal way.
        (syntax):
      1. number::= digit | digit number.
        (examples):
                       1
                       512
                       8192
      2. digit::= "0".."9".
         			1
        The meaning of a digit is an interger in range 0..9 using the normal encoding.


        (semantics): The meaning of a string of n digits d = (d[n] d[n-1] ... d[3] d[2] d[1]) is called it's value. We can model value as a map from syntactically correct numbers (see number above) into the natural numbers, including 0,

      3. Nat0={0,1,2,3,4,...}. The function value is defined by:

        Formula value(d)=sum[i:1..n](10^(i-1) * d[i])

        So for example,

      4. value("8192") = 8*1000 + 1*100 + 9*10 + 2
      5. = 8192.

      Examples vs Syntax vs Semantics

      Note. In the final, in your project, and in class, I will often ask you to give me an example of something. I mean a piece of code that is in the language that shows that you know what it means. If I ask for
    1. an example of a C++ for loop then
                     for(i=1; i<=n; i ++)
                                   v+= d [i-1];
      is good.

      If I ask for the syntax of a C++ for loop I expect something like this

    2. for_loop::= "for" "(" initialization ";" condition ";" increment ")" statement.

      I'm unlikely to ask for formal semantics.... BUT instead I may ask to express a general for loop in simpler terms. This is a form of "Operational Semantics". A good answer would give the general form of for-loop:

                     for (A; B; C) S
      and then a simpler equivalent form:
                     { A; while(B) {
                                   S
                                   C;
                        }
                     }
      Notice that the translation works whatever A,B,C, and S are, as long as the original form is syntactically correct.

    . . . . . . . . . ( end of section CS320/04 Syntax & Parsing -- Parts of Chapters 3 and 4) <<Contents | End>>

    Class Work

    [ 04q.html ]

    Lab Work

    [ lab/04.html ]

    Next

    [ 05.html ]

    Project

  1. LRM::= See http://cse.csusb.edu/dick/cs320/A.html

( End of document ) <<Contents | Top