cs360/notes/26.patterns
Regular Expressions were a theoretical idea that preceeded UNIX by
about 40 years. UNIX was cretatedby people who knew their
computer Science so they borrowed them to describe lines in text files
within editors and other tools.
Why
You write a paper but type dates in two different forms 'Jan 12 1996' and
'12/Jan/1996'. You want to see how many of each you have. Then you
want to change one to the other. Then you want to list them.... Regular
expressions are a simple way to describe what each type of date looks like.
Sections
(PostIt_note): 26.09, 26.10
(exam): 26.02, 26.03, 26.04, 26.05, 26.07, 26.08
(useful): 26.01, 26.02, 26.03, 26.04, 26.05, 26.07, 26.08
(Skip): 26.06
Facts
Regular expressions, invented in the middle forties by Kleene to
model the behavior of brains, turn out to be very useful in a
dozen different tasks. They are used to identify lines in a file
(or other stream of data) that you want to see, process, edit,
remove, change, print,....
These are used in: vi, ed, ex, grep, egrep, awk, sed, more, less,...
The idea is simple.... some characters are special and others stand
for themselves. But there are two complications: the shell thinks the
characters are special to (but for different meanings:-(), and different
programs have subtle differences.
Be careful to notice that "^" and "$" are anchors only when at the beggining
(^) or the end ($) of a pattern
Notice that "^" has two different meanings.... at the start of a patern
it anchors the pattern to the start of the line, but inside square
brackets it means not.
26.05: Study this advice well. It will help.
Be careful.... "*" indicates "any number including NONE".
26.07: You'll need this before you know it.
REs in awk
I use awk in my "lookup" scripts. For an introduction see
[ lookup.html ]
Exceptions
Be careful to separate wildcards in file names
from regular expresions. The syntax, semantics, and reasons for using
them are all different.
For Programmers
The C libraries contain two functions. The first compiles a regular
expression held in a string, and the other runs the compiled
subroutine to test for a match.
Syntax
UNIX Commands
- ed::command, the original editor.
- ex::command. the extended editor. command line mode in vi as well.
- vi::command, visual editor.
- grep::command. search_program, short for g/RE/p in ed,
see do_it_to_all_lines_that_match
- egrep::command. An extended grep -- more options and often faster. search_program.
- fgrep::command. Fixed grep - search for strings with no magic. search_program.
- expr::command. Can be used to test to see if an argument matches a pattern in a shell script:
[ expr in 45.scripts ]
[ expr in 45.scripts ]
- find_lines_in_file::= search_program options pattern files.
- search_program::= fgrep | grep | egrep | agrep. agrep is non-standard.
- find_lines_with_strings_in_files::= fgrep options string files.
- find_lines_matching_patterns::= grep grep_options pattern files.
- Start_edit_at_line_with_string::= vi +/"pattern" file | ex +/"pattern" file.
- grep_options::= -`any combination of grep_option.
- grep_option::= v | w | i | l | c ....
- magic::=having RE in searches in vi. Found in grep but not fgrep.
- editor::= vi | ed | ex | edit.
- call_an_editor_to_create_or_change_a_file::= editor filename.
- call_vi_to_view_a_file_only::= view filename.
Searching in editors
- search_forward::= /pattern/
- search_backwards::=?pattern?
Global commands in ed/ex
- do_it_to_all_lines_that_match::= g/pattern/ed_command.
- ed_command::=delete | insert | move | substitute | join | read | write | ...
- ex_command::=just like ed ony more of them.
Global commands in vi
- do_it_to_all_lines_that_match_in_vi::= :g/pattern/ex_command.
- Symbols_used_in_patterns::=following
Patterns
- pattern::= optional(start_of_line_anchor) main_pattern optional(end_of_line_anchor).
- main_pattern::=any_number( (any_char | set_of_chars | normal_char | escaped_char) optional(any_number_of_the_previous))).
- start_of_line_anchor::= ^
- end_of_line_anchor::= $
- set_of_characters::= [ #( character | character - character ) ].
- any_character::= .
- any_number_of_the_previous::= *
Regular Expresions
- RE::=regular_expression.
- regular_expression::=a regular expression use sequence+selection+iteration to express a set of possibillities.
- REs::=plural of RE
Extended REs for egrep
- or::= |
- option::= ?
- one_or_more_of_the_previous::= +
- n_thru_m_repetitions::= \{ n, m \}
- n_repetitions::= \{ n \}
REs in ex/vi
- start_of_a_word::= \<
- end_of_a_word::= \>
. . . . . . . . . ( end of section Syntax) <<Contents | End>>
Exercises
Use the command
grep pattern file
to find all the lines in /etc/termcap that match refer to a vt100 display.
Create a file with your own personal notes on regular
expressions. Max size: 78 chars wide and 23 line long.
Questions
See the uses of regular expresions...
See Also
[ 27.searches.html ]
[ 27.searches.html ]
[ 33.ex+ed.html ]
[ 33.ex+ed.html ]
[ awk.html ]
[ awk.html ]
[ 34.sed.html ]
[ 34.sed.html ]
, and other uses of regular expressions.
Submit Your Notes Here
To earn credit for completing this part of the course
you need to send me a short list of things you have learned.
A simple way to do this is to
follow this
[click here
if you can fill this hole]
link and fill in the form using copy and paste.