aboutsummaryrefslogtreecommitdiffhomepage
path: root/libslang/doc/tm/regexp.tm
diff options
context:
space:
mode:
Diffstat (limited to 'libslang/doc/tm/regexp.tm')
-rw-r--r--libslang/doc/tm/regexp.tm98
1 files changed, 98 insertions, 0 deletions
diff --git a/libslang/doc/tm/regexp.tm b/libslang/doc/tm/regexp.tm
new file mode 100644
index 0000000..7874b49
--- /dev/null
+++ b/libslang/doc/tm/regexp.tm
@@ -0,0 +1,98 @@
+\chapter{Regular Expressions}
+
+ The S-Lang library includes a regular expression (RE) package that
+ may be used by an application embedding the library. The RE syntax
+ should be familiar to anyone acquainted with regular expressions. In
+ this section the syntax of the \slang regular expressions is
+ discussed.
+
+\sect{\slang RE Syntax}
+
+ A regular expression specifies a pattern to be matched against a
+ string, and has the property that the contcatenation of two REs is
+ also a RE.
+
+ The \slang library supports the following standard regular
+ expressions:
+#v+
+ . match any character except newline
+ * matches zero or more occurences of previous RE
+ + matches one or more occurences of previous RE
+ ? matches zero or one occurence of previous RE
+ ^ matches beginning of a line
+ $ matches end of line
+ [ ... ] matches any single character between brackets.
+ For example, [-02468] matches `-' or any even digit.
+ and [-0-9a-z] matches `-' and any digit between 0 and 9
+ as well as letters a through z.
+ \< Match the beginning of a word.
+ \> Match the end of a word.
+ \( ... \)
+ \1, \2, ..., \9 Matches the match specified by nth \( ... \)
+ expression.
+#v-
+ In addition the following extensions are also supported:
+#v+
+ \c turn on case-sensitivity (default)
+ \C turn off case-sensitivity
+ \d match any digit
+ \e match ESC char
+#v-
+Here are some simple examples:
+
+ \exmp{"^int "} matches the \exmp{"int "} at the beginning of a line.
+
+ \exmp{"\\<money\\>"} matches \exmp{"money"} but only if it appears
+ as a separate word.
+
+ \exmp{"^$"} matches an empty line.
+
+ A more complex pattern is
+#v+
+ "\(\<[a-zA-Z]+\>\)[ ]+\1\>"
+#v-
+ which matches any word repeated consecutively. Note how the grouping
+ operators \exmp{\\(} and \exmp{\\)} are used to define the text
+ matched by the enclosed regular expression, and then subsequently
+ referred to \exmp{\\1}.
+
+ Finally, remember that when used in string literals either in the
+ \slang language or in the C language, care must be taken to
+ "double-up" the \exmp{'\\'} character since both languages treat it
+ as an escape character.
+
+\sect{Differences between \slang and egrep REs}
+
+ There are several differences between \slang regular expressions and,
+ e.g., \bf{egrep} regular expressions.
+
+ The most notable difference is that the \slang regular expressions do
+ not support the \bf{OR} operator \exmp{|} in expressions. This means
+ that \exmp{"a|b"} or \exmp{"a\\|b"} do not have the meaning that they
+ have in regular expression packages that support egrep-style
+ expressions.
+
+ The other main difference is that while \slang regular expressions
+ support the grouping operators \exmp{\\(} and \exmp{\\)}, they are
+ only used as a means of specifying the text that is matched. That
+ is, the expression
+#v+
+ "@\([a-z]*\)@.*@\1@"
+#v-
+ matches \exmp{"xxx@abc@silly@abc@yyy"}, where the pattern \exmp{\\1}
+ matches the text enclosed by the \exmp{\\(} and \exmp{\\)}
+ expressions. However, in the current implementation, the grouping
+ operators are not used to group regular expressions to form a single
+ regular expression. Thus expression such as \exmp{"\\(hello\\)*"} is
+ \em{not} a pattern to match zero or more occurances of \exmp{"hello"}
+ as it is in e.g., \bf{egrep}.
+
+ One question that comes up from time to time is why doesn't \slang
+ simply employ some posix-compatible regular expression library. The
+ simple answer is that, at the time of this writing, none exists that
+ is available across all the platforms that the \slang library
+ supports (Unix, VMS, OS/2, win32, win16, BEOS, MSDOS, and QNX) and
+ can be distributed under both the GNU and Artistic licenses. It is
+ particularly important that the library and the interpreter support a
+ common set of regular expressions in a platform independent manner.
+