diff options
Diffstat (limited to 'libslang/doc/tm/regexp.tm')
-rw-r--r-- | libslang/doc/tm/regexp.tm | 98 |
1 files changed, 98 insertions, 0 deletions
diff --git a/libslang/doc/tm/regexp.tm b/libslang/doc/tm/regexp.tm new file mode 100644 index 0000000..7874b49 --- /dev/null +++ b/libslang/doc/tm/regexp.tm @@ -0,0 +1,98 @@ +\chapter{Regular Expressions} + + The S-Lang library includes a regular expression (RE) package that + may be used by an application embedding the library. The RE syntax + should be familiar to anyone acquainted with regular expressions. In + this section the syntax of the \slang regular expressions is + discussed. + +\sect{\slang RE Syntax} + + A regular expression specifies a pattern to be matched against a + string, and has the property that the contcatenation of two REs is + also a RE. + + The \slang library supports the following standard regular + expressions: +#v+ + . match any character except newline + * matches zero or more occurences of previous RE + + matches one or more occurences of previous RE + ? matches zero or one occurence of previous RE + ^ matches beginning of a line + $ matches end of line + [ ... ] matches any single character between brackets. + For example, [-02468] matches `-' or any even digit. + and [-0-9a-z] matches `-' and any digit between 0 and 9 + as well as letters a through z. + \< Match the beginning of a word. + \> Match the end of a word. + \( ... \) + \1, \2, ..., \9 Matches the match specified by nth \( ... \) + expression. +#v- + In addition the following extensions are also supported: +#v+ + \c turn on case-sensitivity (default) + \C turn off case-sensitivity + \d match any digit + \e match ESC char +#v- +Here are some simple examples: + + \exmp{"^int "} matches the \exmp{"int "} at the beginning of a line. + + \exmp{"\\<money\\>"} matches \exmp{"money"} but only if it appears + as a separate word. + + \exmp{"^$"} matches an empty line. + + A more complex pattern is +#v+ + "\(\<[a-zA-Z]+\>\)[ ]+\1\>" +#v- + which matches any word repeated consecutively. Note how the grouping + operators \exmp{\\(} and \exmp{\\)} are used to define the text + matched by the enclosed regular expression, and then subsequently + referred to \exmp{\\1}. + + Finally, remember that when used in string literals either in the + \slang language or in the C language, care must be taken to + "double-up" the \exmp{'\\'} character since both languages treat it + as an escape character. + +\sect{Differences between \slang and egrep REs} + + There are several differences between \slang regular expressions and, + e.g., \bf{egrep} regular expressions. + + The most notable difference is that the \slang regular expressions do + not support the \bf{OR} operator \exmp{|} in expressions. This means + that \exmp{"a|b"} or \exmp{"a\\|b"} do not have the meaning that they + have in regular expression packages that support egrep-style + expressions. + + The other main difference is that while \slang regular expressions + support the grouping operators \exmp{\\(} and \exmp{\\)}, they are + only used as a means of specifying the text that is matched. That + is, the expression +#v+ + "@\([a-z]*\)@.*@\1@" +#v- + matches \exmp{"xxx@abc@silly@abc@yyy"}, where the pattern \exmp{\\1} + matches the text enclosed by the \exmp{\\(} and \exmp{\\)} + expressions. However, in the current implementation, the grouping + operators are not used to group regular expressions to form a single + regular expression. Thus expression such as \exmp{"\\(hello\\)*"} is + \em{not} a pattern to match zero or more occurances of \exmp{"hello"} + as it is in e.g., \bf{egrep}. + + One question that comes up from time to time is why doesn't \slang + simply employ some posix-compatible regular expression library. The + simple answer is that, at the time of this writing, none exists that + is available across all the platforms that the \slang library + supports (Unix, VMS, OS/2, win32, win16, BEOS, MSDOS, and QNX) and + can be distributed under both the GNU and Artistic licenses. It is + particularly important that the library and the interpreter support a + common set of regular expressions in a platform independent manner. + |