[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] / [ MATHS ] / math_65_Meta_Linguistics
[Contents] [Source Text] || [Notation] || [Copyright] || [Contact] [Search ]
Sun Oct 14 12:18:44 PDT 2012


    MATHS as a MetaLanguage

      MATHS can be used as a powerful meta-language to define formal and computer languages.



      The MATHS approach to language definition is the observation that all practical languages of any size has been defined by several sets of rules that define and interpret the Each set of rules is typically a constraint on the possible strings. Typically there is a lexical description, a set of syntax rules, and some informal semantics. Often the basic language is expanded by relations that add new strings to the language but present translation and transformation rules into the basic language. In MATHS we formalise this by assuming that formal language can be defined by the four sets of rules:

      Sets of Rules
      LexiconLexicalDescribed by a Lexical grammar
      GlossaryInformaldescriptions of elements in the language.
      GrammarGrammaticalA set of Syntactic grammar
      SemanticsMeaningfulA set of partial maps from grammatical strings into meanings
      DefinedDefinitionsmap generalised strings into Meaningful strings

      (Close Table)


      In essence the language definition task is create a relation that maps certain recognisable strings of symbols into a set of meanings. The standard approach is to separate the description of the set of strings (syntax) from determining the meaning(semantics). Typically there are to other techniques - a map from strings of characters into strings of 'tokens' representing 'lexemes', and a mapping of certain defined patterns into a standard language by a collection of 'definitions' or 'macros'.

    1. LANGUAGE::=
      1. Alphabet::FiniteSet,
      2. Lexical::GRAMMAR and Net{terminal:=character_string | name_of_character, source, tokens, lexeme in non_terminal}.
      3. Syntax::@GRAMMAR(Alphabet=>lexeme) and Net{ terminal:= lexeme_name|character_string, tokens in non_terminal}.
      4. Meanings::Sets.
      5. Meaningful::=/Denotations(Meanings),
      6. Denotation::Grammatical<>->Meanings,
      7. Definitions::@#Alphabet->@#Alphabet,
      8. Language::= Standard./do(|(Definitions)),
      9. Standard::= (source(Lexical) & Grammatical & Meaningful),
      10. Gramatical::= &{G.source||G:Syntax},

        Since there is no effective procedure for deciding whether an arbitrary language defined by this technique is empty or not, it is necesary for the designer to always provide examples of strings in the language.

      11. Examples::@Language~{}.



      Grammars are is a specialized form of formal documentation. A grammar is a set of definitions of names of sets of strings. Where strings are (finite) sequences of terminals. A typical terminal might be a character, a data type, a name of an event, an object in a data base or data structure, a message type, etc... Each term defined in a grammar represents a set of sequences. Typically a non_terminal can define a language, a set of valid or possible files, a set of interesting patterns of events, a strategy for getting particular data from a data base or data structure, a trace of a program, a communications protocol, etc.

      A context free grammars have definitions solely in terms of regular expressions (made of #,|,. alone). These and other Chomskian grammars are defined in Math.Strings.


      MATHS is not limitted to Chomskian grammars. Formally any set of simultaneous equations expressed in terms of set theory etc might define a unique language by using the smallest language that is a fixed point under the map formed by replacing defined terms by their definitions. As long as this substitution tends to only add longer strings to the sets, then there is a unique fixed point and so a unique language defined. However there is no reason to suppose that a general set of defintions like this is any sense efficient and a wise engineer will select definitions that lead to efficient checkers/parsers.

      A useful technique is to first define a simple context free grammar and then develop from it (by the insertion of tags in sequences and/or pairing the non-terminals with another grammar) a definition of a lexer, translator, parser, or other program.


      Complexity - The connonical grammar defines exactly the set of terms that the user understands, no more, no less.

      Efficiency - the designer may be permitted to change the language to make recognition, translation and intepretation efficient, but should not search for the most efficient expression of the actual language until a working and correct and complete version is already running.

      Inconsistency - the designer must exhibit a string defined by the system of definitions.

      Ambiguity - the designer should not add to or diminish the ambiguity already present in the language (if any).

      [ Math.64 Meta Macros ]


      The definition of syntax has been understood and used with success throughout computing since 1964. 'Semantics' is about what well formed strings mean. There is less agreement about how meaning should be defined. Initially natural language was used (COBOL) but even the acknowledged best designers failed to produce clean definitions this way (ALGOL 60's Blemishes). Later attempts map the language strings into a formal subset of a natural language (ALGOL68, thence to Pascal, Ada, Edison,...).

      Another method is to define an abstract machine and map statements in the language into changes in the state of the machine (VDM tested on defining ALGOL 60 & PL/1). A variation of this is to define a virtual machine that can interpret a simplified version of the language plus a translator from the language into the simplified form. The interpreter can then be implemented in software or hardware (BCPL and OCODE, Pascal and PCODE, Modular 2 and LILITH). The LISP language is unique in being defined solely by a very simple LISP program that interprets LISP.

      The axiomatic approach of Hoare associates statements in a programming language with inference rules allowing properties of the whole to be derived from the parts. Dijkstra's form of this is method is to specify for each elementary step and structure the weakest pre-condition necesary to guarantee a particular post_condition. Theorists have demonstrated that it is possible to define most programming languages and axiomatic specifications have been published for Pascal - but no well known language definitions have used this technique.

      Growing from early attempts to map the language into a completely formal language (a logic), the approach created by Strachey, Stoy and Scott is to provide 'denotations' that map parts of a programming language into a set of meanings. The meanings have someking of algebraic and logical structure. The programming language elements and syntactic structures are mapped into elements and expressions in the abstract mathematical space of meanings. It is then necesary to show that one and only one meaning for each string in the language. Again this has developed into a feasible approach but has not been applied in any well known language. Scott has reccomended using complete partially ordered sets and function spaces to model the meanings of languages with some success more recently Manes and Arbib propose using categories instead.

      MATHS can be used as a notation with any of these methods - the symbolic logic is suitable for specifying axioms for example, and the mathematics provides several models of abstract machines.


      An example definition is:

    2. For Q:@States, S:@#Input, Q S::= {q':States || for some q:Q, s:S ( q |--[s] q')}.

      This indicates that any string that has two parts, the first repsenting a set of states, and the second a set of strings of inputs will be replaced by the more complex intensional set on the right of the tilde. A definition indicates a process of analysing the input, matching bound symbols and forming a replacement string out of the strings that match the bound symbols and the remaining characters as they stand. Syntax

    3. definition::= "For" bindings "," #(bound_symbol | lexeme~(comma | ":"":=")) ":"":= " expression&(#(bound_symbol | lexeme)), where bound_symbols are those bound in the bindings.

      .Road_works_ahead Semantics ??{

    4. Definition:#Alphabet<>->#Alphabet,
    5. Definition=rel[s,... }?

    . . . . . . . . . ( end of section MATHS as a MetaLanguage) <<Contents | End>>

    Notes on MATHS Notation

    Special characters are defined in [ intro_characters.html ] that also outlines the syntax of expressions and a document.

    Proofs follow a natural deduction style that start with assumptions ("Let") and continue to a consequence ("Close Let") and then discard the assumptions and deduce a conclusion. Look here [ Block Structure in logic_25_Proofs ] for more on the structure and rules.

    The notation also allows you to create a new network of variables and constraints. A "Net" has a number of variables (including none) and a number of properties (including none) that connect variables. You can give them a name and then reuse them. The schema, formal system, or an elementary piece of documentation starts with "Net" and finishes "End of Net". For more, see [ notn_13_Docn_Syntax.html ] for these ways of defining and reusing pieces of logic and algebra in your documents. A quick example: a circle might be described by Net{radius:Positive Real, center:Point, area:=π*radius^2, ...}.

    For a complete listing of pages in this part of my site by topic see [ home.html ]

    Notes on the Underlying Logic of MATHS

    The notation used here is a formal language with syntax and a semantics described using traditional formal logic [ logic_0_Intro.html ] plus sets, functions, relations, and other mathematical extensions.

    For a more rigorous description of the standard notations see

  1. STANDARD::= See http://www.csci.csusb.edu/dick/maths/math_11_STANDARD.html


  2. above::reason="I'm too lazy to work out which of the above statements I need here", often the last 3 or 4 statements. The previous and previous but one statments are shown as (-1) and (-2).
  3. given::reason="I've been told that...", used to describe a problem.
  4. given::variable="I'll be given a value or object like this...", used to describe a problem.
  5. goal::theorem="The result I'm trying to prove right now".
  6. goal::variable="The value or object I'm trying to find or construct".
  7. let::reason="For the sake of argument let...", introduces a temporary hypothesis that survives until the end of the surrounding "Let...Close.Let" block or Case.
  8. hyp::reason="I assumed this in my last Let/Case/Po/...".
  9. QED::conclusion="Quite Easily Done" or "Quod Erat Demonstrandum", indicates that you have proved what you wanted to prove.
  10. QEF::conclusion="Quite Easily Faked", -- indicate that you have proved that the object you constructed fitted the goal you were given.
  11. RAA::conclusion="Reducto Ad Absurdum". This allows you to discard the last assumption (let) that you introduced.