.Open Languages This set of notes are focussed on developing a theory of languages that can be used to define realisticformal languages and develop tools to manipulate them. As a result they can, by Michael A Jackson's pioneering methods they can be used to develop programs that process data (JSP) and Management Information Systems (JSD). I have written a monograph .See ../monograph/ to explore this. .RoadWorksAhead .Hole Languages and Grammars The mathematical model of a language is a set of strings. The strings must be sequences of zero or more symbols taken from a finite set often called an alphabet. LANGUAGE::=following .Net |- $STRINGS Alphabet::Sets, |- finite Alphabet. Strings::=#Alphabet. Languages::=@Strings. See also STRINGS::=http://www/dick/maths/math_62_Strings.html. .Open Properties of Languages The set of languages for a given alphabet forms a Boolean Algebra with repsect to union, intersection, and complement, of course. More they form a Monoid with respect to concatenation: For A,B:Languages, A B ::Languages= {a ! b . a:A, b:B}. 1 ::Languages= {}. ()|-MONOID( Languages, (), 1 ). Indeed, we have a Ring with Union and Concatenation, but not with intersection and concatenation. .Hole codes, equivalence, etc. .Close Properties of Languages .Close.Net . Grammar A Grammar defines a language and imposes a stucture on the strings in that language. Chomski invented a model for how grammars work. A grammar describes (explicitly) rules for generating a correct strings in a given language. If well chosen each string in the language can be generated in essentially only one way. This way of generating the string defines the structure of the string - how it is parsed. Grammars -- especially written in various extensions of Bachus-Naur-Form (BNF) -- are commonly used to define computer languages. Several examples can be found in my .See ../samples/ directory. GRAMMAR::=Net{ Alphabet::Sets=given, |- finite Alphabet. productions::Sets. P::=@productions, terminals::@#Alphabet=given, T::= terminals, non_terminals::Finite_sets Name=given, N::= non_terminals=given, vocabulary::=T|N. V::=vocabulary. source::non_terminals=given, S::=source, meta_language::=#(production | comment), M::=meta_language, comment::=#character~production. |- P = img(M)&production. }=::GRAMMAR. . Hierarchy of Meta_languages Different restrictions on the form of productions define different classes of languages. Many of these have been studied in detail. Here is a table of some classic grammatical types. The form of Productions is shown with capital letters (A,B,C) being used for Nonterminals and lower case letters for terminal symbols. Greek letters ( \alpha, \beta, ...) are for strings of terminals and nonterminals. .Table Name Form of Productions Notes .Row Regular A->a, A->a B Chomski .Row Context Free A->\alpha Chomski .Row Conjunctive Grammars A->\alpha & \beta & ... .Item See .See [Okhotin03] and my own .See ../monograph/03-intersections.html .Row Boolean Grammars A->\alpha & \beta & ... & \not \gamma & \not \delta .Item See .See [Okhotin04b] .Row Context_sensitive \alpha A \beta->\alpha \gamma \beta Chomski .Row Context_dependent \beta->\alpha Chomski .Close.Table In the above table the notation x->y indicates a substitution, .See ./math_62_Strings.html#Substitution for a formal definition. (theory)|-$regular(T)=>>$context_free_language(T) =>>$para_grammar_language=>> $extended_grammar_language. No useful languages are context free - so more general form of grammars are needed. However CFGs are the commonest form of grammar used in practice. They provide useful information in Language Reference Manuals and the easiest way of designing compilers and interpreters. They are also the basis of Jackson Structured Programming. Context_Free_Grammars::=$CFG, below: CFG::=Net{ Extended Backus-Naur Form is equivalent to a context free grammar(CFG). There exist standard tools for handling context free grammars. |- GRAMMAR(production=>context_free_production). context_free_production::syntax=non_terminal ":"":""=" $regular_expression, regular_expression::=Regular_Expressions(terminal|non_terminal). Note this is more general than Chomski's definition of a CFG but describes the same family of languages. }=::CFG. . Lexicons and Lexical Analysis Most languages have a basic vocabulary of strings that are treated as complete system (a lexeme). The basic symbols of a language are always defined by a simple context free grammar. Indeed most lexicons are a regular language. These are best defined using simple descriptions: .As_is word :: lexeme, purpose. or definitions, .As_is name :: lexeme = string, purpose. .As_is name :: lexeme = regular_expression, purpose. .Net typedef::lexeme, use to introduce type definitions in C/C++/Java/etc. double_plus::lexeme="++", used to add 1 to a variable in C/C++/Java etc. all::lexeme="all", universal quantifier. some::lexeme="some", existential quantifier. Ada_end::lexeme=ignore_case("end"). .Close.Net . Para_Grammars Para_Grammar::=following, .Net |-$GRAMMAR(production=>para_production). para_production::= $non_terminal ":"":""=" para_regular_expression, para_regular_expression::=(("&" | "|")$parameters|) $item #(("&"|"~")$item), item::= (|hash)($terminal |$non_terminal | "(" $selection ")" |$parameter), selection::= $sequence #("|"$sequence), sequence ::= #($item |$parameterized), parameterized::= $parameters$item, parameters::=l_bracket $binding #(comma$binding) r_bracket, binding::=$parameter #(comma$parameter) ":"\$item, parameter::=variable. Parameters are used to indicate correspondences across sequences, etc. Example .Let dup::=|[w:#Bits ](w w). defines a context sensitive language in a very simple form. .Close.Let .Close.Net Para_Grammar . Semantics The meaning of a grammar depends on how it is used - lexical or syntactical, .... Specialized languages can be constructed by using a grammar with different sets of terminal symbols and by interpretting the structures and tags differently. By default you can assume that a sequence is parsed as an array or list (of type #X for some X). Parameterized items become mappings from parameters to content(of type X<>->Y). For more, see .See http://www/dick/maths/notn_2_Structure.html .See http://www/dick/maths/notn_14_Docn_Semantics.html#Definitions .See http://www/dick/maths/notn_13_Docn_Syntax.html, for an example of a how the formal grammar of MATHS maps into a structured objet. . Translation A translation can be defined by giving two corresponding grammars. One defines the input, and the other the output. Tags indicate corresponding strings in the two grammars. More sophisticated is to simultaneously define both the input and the output at the same time. Here each definition doesn't define a set of strings, but does define a set of pairs of strings (equivalent therefore to a relationship between two languages). A simple version of this idea will be found in Aho and Ullman Volume 1. . Parsers A parser constructs a data structure that encodes the input sequence. Each non_terminal in the grammar will have its own associated type of object in this data structure. A particular input is encoded as a set of objects which in generalwill not be a tree but a directed acyclic graph(dag). In MATHS a simple formal model might be to simultanously define the abstract and the concrete syntax like this: (expression, expressions) ::@#Alphabet>