From fad15f79b1230b3076be515d6894c8919562809b Mon Sep 17 00:00:00 2001
From: mitchell <unknown>
Date: Sat, 25 Apr 2020 16:26:31 -0400
Subject: Reformatted Lua LPeg lexers and added new convenience functions and
 pattern. `lexer.range()` replaces `lexer.delimited_range()` and
 `lexer.nested_pair()`. `lexer.to_eol()` replaces `patt * lexer.nonnewline^0`
 constructs. `lexer.number` replaces `lexer.float + lexer.integer`. Also added
 unit tests for lexer functions.

---
 doc/LPegLexer.html | 233 ++++++++++++++++++++++++++---------------------------
 1 file changed, 113 insertions(+), 120 deletions(-)

(limited to 'doc/LPegLexer.html')
diff --git a/doc/LPegLexer.html b/doc/LPegLexer.html
index e31a091b1..3f553e9f9 100644
--- a/doc/LPegLexer.html
+++ b/doc/LPegLexer.html
@@ -226,6 +226,13 @@
           as fold points. For example, the C line <code>} else {</code> would be
           marked as a fold point. The default is <code>0</code>.</td>
         </tr>
+
+        <tr>
+          <td><code>fold.compact</code></td>
+
+          <td>If <code>fold.compact</code> is set to <code>1</code>, blank lines
+          after an ending fold point are included in that fold.
+        </tr>
       </tbody>
     </table>
 
@@ -672,7 +679,7 @@ operator    30
     <a href="#lexer.punct"><code>lexer.punct</code></a>, <a href="#lexer.space"><code>lexer.space</code></a>, <a href="#lexer.newline"><code>lexer.newline</code></a>,
     <a href="#lexer.nonnewline"><code>lexer.nonnewline</code></a>, <a href="#lexer.nonnewline_esc"><code>lexer.nonnewline_esc</code></a>, <a href="#lexer.dec_num"><code>lexer.dec_num</code></a>,
     <a href="#lexer.hex_num"><code>lexer.hex_num</code></a>, <a href="#lexer.oct_num"><code>lexer.oct_num</code></a>, <a href="#lexer.integer"><code>lexer.integer</code></a>,
-    <a href="#lexer.float"><code>lexer.float</code></a>, and <a href="#lexer.word"><code>lexer.word</code></a>. You may use your own token names if
+    <a href="#lexer.float"><code>lexer.float</code></a>, <a href="#lexer.number"><code>lexer.number</code></a>, and <a href="#lexer.word"><code>lexer.word</code></a>. You may use your own token names if
     none of the above fit your language, but an advantage to using predefined
     token names is that your lexer's tokens will inherit the universal syntax
     highlighting color theme used by your text editor.</p>
@@ -725,9 +732,8 @@ operator    30
     <p>Line-style comments with a prefix character(s) are easy to express with LPeg:</p>
 
     <pre><code>
-    local shell_comment = token(lexer.COMMENT, '#' * lexer.nonnewline^0)
-    local c_line_comment = token(lexer.COMMENT,
-                                 '//' * lexer.nonnewline_esc^0)
+    local shell_comment = token(lexer.COMMENT, lexer.to_eol('#'))
+    local c_line_comment = token(lexer.COMMENT, lexer.to_eol('//', true))
     </code></pre>
 
     <p>The comments above start with a '#' or "//" and go to the end of the line.
@@ -738,8 +744,7 @@ operator    30
     express:</p>
 
     <pre><code>
-    local c_comment = token(lexer.COMMENT,
-                            '/*' * (lexer.any - '*/')^0 * P('*/')^-1)
+    local c_comment = token(lexer.COMMENT, lexer.range('/*', '*/'))
     </code></pre>
 
     <p>This comment starts with a "/*" sequence and contains anything up to and
@@ -748,24 +753,14 @@ operator    30
 
     <p><strong>Strings</strong></p>
 
-    <p>It is tempting to think that a string is not much different from the block
-    comment shown above in that both have start and end delimiters:</p>
-
-    <pre><code>
-    local dq_str = '"' * (lexer.any - '"')^0 * P('"')^-1
-    local sq_str = "'" * (lexer.any - "'")^0 * P("'")^-1
-    local simple_string = token(lexer.STRING, dq_str + sq_str)
-    </code></pre>
-
-    <p>However, most programming languages allow escape sequences in strings such
-    that a sequence like "\&quot;" in a double-quoted string indicates that the
-    '&quot;' is not the end of the string. The above token incorrectly matches
-    such a string. Instead, use the <a href="#lexer.delimited_range"><code>lexer.delimited_range()</code></a> convenience
-    function.</p>
+    <p>Most programming languages allow escape sequences in strings such that a
+    sequence like &ldquo;\&quot;&rdquo; in a double-quoted string indicates that the
+    &lsquo;&quot;&rsquo; is not the end of the string. <a href="#lexer.range"><code>lexer.range()</code></a> handles escapes
+    inherently.</p>
 
     <pre><code>
-    local dq_str = lexer.delimited_range('"')
-    local sq_str = lexer.delimited_range("'")
+    local dq_str = lexer.range('"')
+    local sq_str = lexer.range("'")
     local string = token(lexer.STRING, dq_str + sq_str)
     </code></pre>
 
@@ -775,10 +770,10 @@ operator    30
     <p><strong>Numbers</strong></p>
 
     <p>Most programming languages have the same format for integer and float tokens,
-    so it might be as simple as using a couple of predefined LPeg patterns:</p>
+    so it might be as simple as using a predefined LPeg pattern:</p>
 
     <pre><code>
-    local number = token(lexer.NUMBER, lexer.float + lexer.integer)
+    local number = token(lexer.NUMBER, lexer.number)
     </code></pre>
 
     <p>However, some languages allow postfix characters on integers.</p>
@@ -1391,11 +1386,11 @@ operator    30
     lex:add_rule('whitespace', token(lexer.WHITESPACE, lexer.space^1))
     lex:add_rule('keyword', token(lexer.KEYWORD, word_match[[foo bar baz]]))
     lex:add_rule('custom', token('custom', P('quux')))
-    lex:add_style('custom', lexer.STYLE_KEYWORD..',bold')
+    lex:add_style('custom', lexer.STYLE_KEYWORD .. ',bold')
     lex:add_rule('identifier', token(lexer.IDENTIFIER, lexer.word))
-    lex:add_rule('string', token(lexer.STRING, lexer.delimited_range('"')))
-    lex:add_rule('comment', token(lexer.COMMENT, '#' * lexer.nonnewline^0))
-    lex:add_rule('number', token(lexer.NUMBER, lexer.float + lexer.integer))
+    lex:add_rule('string', token(lexer.STRING, lexer.range('"')))
+    lex:add_rule('comment', token(lexer.COMMENT, lexer.to_eol('#')))
+    lex:add_rule('number', token(lexer.NUMBER, lexer.number))
     lex:add_rule('operator', token(lexer.OPERATOR, S('+-*/%^=&lt;&gt;,.()[]{}')))
 
     lex:add_fold_point(lexer.OPERATOR, '{', '}')
@@ -1463,7 +1458,7 @@ operator    30
     <h4>Acknowledgements</h4>
 
     <p>Thanks to Peter Odding for his <a href="http://lua-users.org/lists/lua-l/2007-04/msg00116.html">lexer post</a> on the Lua mailing list
-    that inspired me, and thanks to Roberto Ierusalimschy for LPeg.</p>
+    that provided inspiration, and thanks to Roberto Ierusalimschy for LPeg.</p>
 
     <h2>Lua <code>lexer</code> module API fields</h2>
 
@@ -1869,6 +1864,13 @@ operator    30
     <p>A pattern that matches any single, non-newline character or any set of end
       of line characters escaped with '\'.</p>
 
+    <p><a id="lexer.number"></a></p>
+
+    <h3><code>lexer.number</code> (pattern)</h3>
+
+    <p>A pattern that matches a typical number, either a floating point, decimal,
+    hexadecimal, or octal number.</p>
+
     <p><a id="lexer.oct_num"></a></p>
 
     <h3><code>lexer.oct_num</code> (pattern)</h3>
@@ -2071,58 +2073,6 @@ operator    30
     </ul>
 
 
-    <p><a id="lexer.delimited_range"></a></p>
-
-    <h3><code>lexer.delimited_range</code> (chars, single_line, no_escape, balanced)</h3>
-
-    <p>Creates and returns a pattern that matches a range of text bounded by
-    <em>chars</em> characters.
-    This is a convenience function for matching more complicated delimited ranges
-    like strings with escape characters and balanced parentheses. <em>single_line</em>
-    indicates whether or not the range must be on a single line, <em>no_escape</em>
-    indicates whether or not to ignore '\' as an escape character, and <em>balanced</em>
-    indicates whether or not to handle balanced ranges like parentheses and
-    requires <em>chars</em> to be composed of two characters.</p>
-
-    <p>Fields:</p>
-
-    <ul>
-    <li><code>chars</code>: The character(s) that bound the matched range.</li>
-    <li><code>single_line</code>: Optional flag indicating whether or not the range must be
-    on a single line.</li>
-    <li><code>no_escape</code>: Optional flag indicating whether or not the range end
-    character may be escaped by a '\' character.</li>
-    <li><code>balanced</code>: Optional flag indicating whether or not to match a balanced
-    range, like the "%b" Lua pattern. This flag only applies if <em>chars</em>
-    consists of two different characters (e.g. "()").</li>
-    </ul>
-
-
-    <p>Usage:</p>
-
-    <ul>
-    <li><code>local dq_str_escapes = lexer.delimited_range('"')</code></li>
-    <li><code>local dq_str_noescapes = lexer.delimited_range('"', false, true)</code></li>
-    <li><code>local unbalanced_parens = lexer.delimited_range('()')</code></li>
-    <li><code>local balanced_parens = lexer.delimited_range('()', false, false,
-    true)</code></li>
-    </ul>
-
-
-    <p>Return:</p>
-
-    <ul>
-    <li>pattern</li>
-    </ul>
-
-
-    <p>See also:</p>
-
-    <ul>
-    <li><a href="#lexer.nested_pair"><code>lexer.nested_pair</code></a></li>
-    </ul>
-
-
     <p><a id="lexer.embed"></a></p>
 
     <h3><code>lexer.embed</code> (lexer, child, start_rule, end_rule)</h3>
@@ -2241,7 +2191,7 @@ operator    30
 
     <ul>
     <li><code>local regex = lexer.last_char_includes('+-*!%^&amp;|=,([{') *
-    lexer.delimited_range('/')</code></li>
+    lexer.range('/')</code></li>
     </ul>
 
 
@@ -2344,44 +2294,6 @@ operator    30
     </ul>
 
 
-    <p><a id="lexer.nested_pair"></a></p>
-
-    <h3><code>lexer.nested_pair</code> (start_chars, end_chars)</h3>
-
-    <p>Returns a pattern that matches a balanced range of text that starts with
-    string <em>start_chars</em> and ends with string <em>end_chars</em>.
-    With single-character delimiters, this function is identical to
-    <code>delimited_range(start_chars..end_chars, false, true, true)</code>.</p>
-
-    <p>Fields:</p>
-
-    <ul>
-    <li><code>start_chars</code>: The string starting a nested sequence.</li>
-    <li><code>end_chars</code>: The string ending a nested sequence.</li>
-    </ul>
-
-
-    <p>Usage:</p>
-
-    <ul>
-    <li><code>local nested_comment = lexer.nested_pair('/*', '*/')</code></li>
-    </ul>
-
-
-    <p>Return:</p>
-
-    <ul>
-    <li>pattern</li>
-    </ul>
-
-
-    <p>See also:</p>
-
-    <ul>
-    <li><a href="#lexer.delimited_range"><code>lexer.delimited_range</code></a></li>
-    </ul>
-
-
     <p><a id="lexer.new"></a></p>
 
     <h3><code>lexer.new</code> (name, opts)</h3>
@@ -2420,6 +2332,54 @@ operator    30
     </ul>
 
 
+    <p><a id="lexer.range"></a></p>
+
+    <h3><code>lexer.range</code>(<em>s, e, single_line, escapes, balanced</em>)</h3>
+
+    <p>Creates and returns a pattern that matches a range of text bounded by strings
+    or patterns <em>s</em> and <em>e</em>.
+    This is a convenience function for matching more complicated ranges like
+    strings with escape characters, balanced parentheses, and block comments
+    (nested or not). <em>e</em> is optional and defaults to <em>s</em>. <em>single_line</em> indicates
+    whether or not the range must be on a single line; <em>escapes</em> indicates
+    whether or not to allow &lsquo;\&rsquo; as an escape character; and <em>balanced</em> indicates
+    whether or not to handle balanced ranges like parentheses, and requires <em>s</em>
+    and <em>e</em> to be different.</p>
+
+    <p>Parameters:</p>
+
+    <ul>
+    <li><em><code>s</code></em>: String or pattern start of a range.</li>
+    <li><em><code>e</code></em>: Optional string or pattern end of a range. The default value is <em>s</em>.</li>
+    <li><em><code>single_line</code></em>: Optional flag indicating whether or not the range must be
+    on a single line.</li>
+    <li><em><code>escapes</code></em>: Optional flag indicating whether or not the range end may
+    be escaped by a &lsquo;\&rsquo; character.
+    The default value is <code>false</code> unless <em>s</em> and <em>e</em> are identical, single-character strings.
+    In that case, the default value is <code>true</code>.</li>
+    <li><em><code>balanced</code></em>: Optional flag indicating whether or not to match a balanced
+    range, like the &ldquo;%b&rdquo; Lua pattern. This flag only applies if <em>s</em> and <em>e</em> are
+    different.</li>
+    </ul>
+
+
+    <p>Usage:</p>
+
+    <ul>
+    <li><code>local dq_str_escapes = lexer.range('"')</code></li>
+    <li><code>local dq_str_noescapes = lexer.range('"', false, false)</code></li>
+    <li><code>local unbalanced_parens = lexer.range('(', ')')</code></li>
+    <li><code>local balanced_parens = lexer.range('(', ')', false, false, true)</code></li>
+    </ul>
+
+
+    <p>Return:</p>
+
+    <ul>
+    <li>pattern</li>
+    </ul>
+
+
     <p><a id="lexer.starts_line"></a></p>
 
     <h3><code>lexer.starts_line</code> (patt)</h3>
@@ -2449,6 +2409,39 @@ operator    30
     </ul>
 
 
+    <p><a id="lexer.to_eol"></a></p>
+
+    <h3><code>lexer.to_eol</code>(<em>prefix, escape</em>)</h3>
+
+    <p>Creates and returns a pattern that matches from string or pattern <em>prefix</em>
+    until the end of the line.
+    <em>escape</em> indicates whether the end of the line can be escaped with a &lsquo;\&rsquo;
+    character.</p>
+
+    <p>Parameters:</p>
+
+    <ul>
+    <li><em><code>prefix</code></em>: String or pattern prefix to start matching at.</li>
+    <li><em><code>escape</code></em>: Optional flag indicating whether or not newlines can be escaped
+    by a &lsquo;\&rsquo; character. The default value is <code>false</code>.</li>
+    </ul>
+
+
+    <p>Usage:</p>
+
+    <ul>
+    <li><code>local line_comment = lexer.to_eol('//')</code></li>
+    <li><code>local line_comment = lexer.to_eol(P('#') + ';')</code></li>
+    </ul>
+
+
+    <p>Return:</p>
+
+    <ul>
+    <li>pattern</li>
+    </ul>
+
+
     <p><a id="lexer.token"></a></p>
 
     <h3><code>lexer.token</code> (name, patt)</h3>
-- 
cgit v1.2.3