.Open Small Awk . Status THis is a first draft for a new language. The symbol $TBD indicates a known area of incompleteness "To Be Done". . Purpose Small Awk (not $smalltalk!) is designed as sample language studied and worked on in Computer Science classes that study high level computer languages. It is a subset of the $awk language of Aho, Weinberg and Kernighan. IT is a subset created by deleting a lot of useful convenient options and features for $awk. Even so it is a languages that has control structures variables, input, and output. For simple, every day programs it is rather like a small and safe C or C++. Small awk programs can be tested by running them thru any $awk interpreter on a UNIX system. $awk has also been ported to other platforms and can also be used to test smallawk programs. Suppose your smallawk program is in a file called .As_is hello.awk then the command .As_is awk -f hello.awk will test the program. Like $awk, smallawk is designed for processing data files. It works with simpler files than $awk however. Smallawk assumes that it will read a single file and each line in the file is a list of "fields" of data separated by one or more spaces or tabs. Smallawk reads the inoput file and produces a single stream of output by processing the input. Notice that a smallawk program does not start until it is given some data, and doesn't stop until it gets the usual end-of-file. Here is a traditional simple program in smallawk: .As_is END{ print "hello, world"; } It reads the input and at the end of the inpoyut data it outputs the "hello, world message. The following program reads the input file and numbers the lines: .As_is { print NR, $0; } The next program prints outs lines that contain the string "AWK": .As_is /AWK/{ print ;} The next program Assumes that each line has one number, and at the end of the file outputs the total of these nunmbers: .As_is {sum = sum + $0;} .As_is END{print sum;} This one checks the input and only adds a line if it is a valid integer, with one or more decimal digits: .As_is /^[0-9][0-9]*$/{sum = sum + $0;} .As_is END{print sum;} This reads in a file of names, student_ids, and scores and calculates the mean score. It assume input with data separated by spaces like this: .As_is ShortName 9999 3.2 .As_is AnotherName 1234 17 The program is .As_is {sum = sum + $3; count=count+1;} .As_is END{print sum/count;} .Open Syntax . Lexemes smallawk has the normal lexical scan separating variables, constants, strings, from some reserved words: reserved_words::= "END" | "NF" | "NR" | "BEGIN" | "print" | "if" | "else". comma::=",". . Programs A program is a sequence of pieces. Each piece has two parts: a pattern and an action. The pattern states when the action is to be applied. When smallawk is running it takes each line and tries each piece of the program in turn and carries out the actions with patterns that match the line. program::= # $piece. piece::= $O( $pattern ) "{" $action "}". The pattern is used to recognise the lines of data that the action applies to. . Patterns disjunction::= $conjunction #( "||" $conjunction). .As_is /Botting/||/Dick/ conjunction::= $possible_complement #( "&&" $possible_complement). .As_is /Botting/ && /Richard/ possible_complement::= "!" $elementary_pattern | $elementary_pattern. .As_is !/Blotting/ elementary_pattern ::= "END" | "BEGIN" | "/" $regular_expression "/". .As_is /Banana/ .As_is /[bB]anana/ .As_is /^\.As_is/ .As_is /^[0-9]*$/ regular_expression::= $left_most | $rightmost | $exact | $contained. leftmost::= "^" $contained. .As_is ^.As_is rightmost::= $contained "$". .As_is two part:$ exact::= "^" $contained "$". .As_is ^Dick.*Botting$ contained::= #possibly_repeated_set_of_chars. possibly_repeated_set_of_chars::= $possible_set "*" | $possible_set. .As_is pos*ibil*ity The "*" means "zero or more of". The above matches "posssibiity" for example. possible_set::=$normal_character | $escaped_character | $set_of_characters. normal_character::= letter | digit | symbol ~ $special_character. escaped_character::= "\" $special_character. special_character::= "[" | "]" | "\" | "." | "*". set_of_characters::= "[" chars "]" | "[^" chars "]". chars ::= # character | character "-" character. character::= normal_character | escaped_character. . Actions An action is a series of at least one statement. These are executed one after another. action::= $statement #$statement. . Statements statement::= $assignment | $print | $selection | $do_nothing. assignment::= $variable "=" $expression ";" .As_is sum = sum + $0; print::= "print" ";"| "print" $expression ";". .As_is print sum; selection::= "if(" $condition ")" $body "else" $else. .As_is if ( sum > 0 ) print "greater"; else ; condition::=$expression. body::=$action | $do_nothing. else::=$action | $do_nothing. do_nothing::= ";". . Expressions operation::= "+" | "-" | "*" | "/" | "%" | "&&" | "||" | "!". function::= "sin" | "cos" | "log" | "exp" | "sqrt". expression::= $concatenation. concatenation::= $arithmetic_expression #$arithmetic_expression. arithemetic_expression::= simple_expression #(#$operation simple_expression). simple_expression::= function_call | $variable | $constant | parentesized_expression. parenthesized_expression::= "(" $expressison ")". function_call ::= $function "(" expression #( $comma expression ). $TBD : Operations, variables, functions.... . Variables variable::= $whole_line | $field | $global_variable. whole_line::= $dollar "0". .As_is $0 field::= $dollar expression. .As_is $5 global_variable::=letter #(letter|digit). .As_is sum . Constants $TBD .Close Syntax. . Semantics Here are the informal operational semantics of a smallawk program `P`. `P` wll consist of a sequence of `n` pieces `p`[1]..`p`[`n`]. Each piece 'p'['i'] has two parts a pattern 'p'['i'].pattern and an action 'p'['i'].action. Here is a diagram: $TBD Here is a C++like description of what the program `P` does. .As_is .As_is NR=1; .As_is .As_is while( get next line until end of input ) .As_is { .As_is for(i = 1; i<=n; i++) .As_is if( line matches p[i].pattern ) .As_is apply p[i].action to line; .As_is .As_is NR++; .As_is } .As_is .As_is //after end of file .As_is for(i = 1; i<=n; i++) .As_is if( p[i].pattern is "END" ) .As_is apply p[i].action; .As_is A line matches a pattern according to the rules $TBD ... .See http://www.csci.csusb.edu/dick/samples/regular_expressions.html Applying an action to the line starts by assigning the whole line to variable `$0`. Then each field in the line (separated by one or more spaces) is assigned to `$1` thru to `$NF` where `NF` is set to the number of fields. An action is a sequence of one or more instructions and these are executed in turn. If an instruction is an assignment then the expression on the right hand side is evaluated and the resulting value is placed in the variable on the left hand side of the '=' sign. This may change the whole line or any field in the line if the variable is '$0' or '$i' for some other i. If the action is a print command with an expression then the expression is evaluated and output plus a new line. If it is a print with no expression then the whole line (with any changes) is printed. If the instruction is a selection with condition `c` and body `b` and else part 'e', then the condition is evaluated and if it not zero the body `b` is executed. Other wise if `c` evaluates to zero then `e` is executed. Expressions are evaluated in the usual way: constants beome their values, variables return their current value, and operations are applied in order of precedence to give a value. . Authors Richard J Botting .See mailto:rbotting@csusb.edu . See Also awk::=http://www/dick/cs360/notes/awk.doc.html, and some notes .See http://www/dick/cs360/notes/awk.html. smalltalk::=http://www/dick/samples/smalltalk.html . Glossary TBA::="To Be Announced". TBD::="To Be Done". For X, O(X) ::= (X | ), Optional X. .Close Small Awk