From 519b7328b66c4c84f03893a31e4be5ba6b1395f2 Mon Sep 17 00:00:00 2001 From: mitchell Date: Sun, 11 Mar 2018 23:04:41 -0400 Subject: Added optional Lua lexer support. This support is disabled by default and must be enabled via compile-time option. --- doc/LPegLexer.html | 2608 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2608 insertions(+) create mode 100644 doc/LPegLexer.html (limited to 'doc/LPegLexer.html') diff --git a/doc/LPegLexer.html b/doc/LPegLexer.html new file mode 100644 index 000000000..1a0049799 --- /dev/null +++ b/doc/LPegLexer.html @@ -0,0 +1,2608 @@ + + + + + + + + Lua LPeg Lexers + + + + + + + + + + + +
Scintilla iconScintilla
+ +

Lua LPeg Lexers

+ +

Scintilla's LPeg lexer adds dynamic Lua + LPeg lexers to + Scintilla. It is the quickest way to add new or customized syntax + highlighting and code folding for programming languages to any + Scintilla-based text editor or IDE.

+ +

Features

+ + + +

Enabling and Configuring the LPeg Lexer

+ +

Scintilla is not compiled with the LPeg lexer enabled by + default (it is present, but empty). You need to manually enable it with the + LPEG_LEXER flag when building Scintilla and its lexers. You + also need to build and link the Lua source files contained in Scintilla's + lua/src/ directory to lexers/LexLPeg.cxx. If your + application has its own copy of Lua, you can ignore Scintilla's copy and + link to yours. + +

At this time, only the GTK, curses, and MinGW32 (for win32) platform + makefiles facilitate enabling the LPeg lexer. For example, when building + Scintilla, run make LPEG_LEXER=1. User contributions to + facilitate this for the other platforms is encouraged.

+ +

When Scintilla is compiled with the LPeg lexer enabled, and after + selecting it as the lexer to use via + SCI_SETLEXER or + SCI_SETLEXERLANGUAGE, + the following property must be set via + SCI_SETPROPERTY:

+ + + + + + + + + +
lexer.lpeg.homeThe directory containing the Lua lexers. This is the path + where you included Scintilla's lexlua/ directory in + your application's installation location.
+ +

The following properties are optional and may or may not be set:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
lexer.lpeg.color.themeThe color theme to use. Color themes are located in the + lexlua/themes/ directory. Currently supported themes + are light, dark, scite, and + curses. Your application can define colors and styles + manually through Scintilla properties. The theme files have + examples.
foldFor Lua lexers that have a folder, folding is turned on if + fold is set to 1. The default is + 0.
fold.by.indentation + + For Lua lexers that do not have a folder, if + fold.by.indentation is set to 1, folding is + done based on indentation level (like Python). The default is + 0.
fold.line.commentsIf fold.line.comments is set to 1, + multiple, consecutive line comments are folded, and only the top-level + comment is shown. There is a small performance penalty for large + source files when this option and folding are enabled. The default is + 0.
fold.on.zero.sum.linesIf fold.on.zero.sum.lines is set to 1, + lines that contain both an ending and starting fold point are marked + as fold points. For example, the C line } else { would be + marked as a fold point. The default is 0.
+ +

Using the LPeg Lexer

+ +

Your application communicates with the LPeg lexer using Scintilla's + SCI_PRIVATELEXERCALL + API. The operation constants recognized by the LPeg lexer are based on + Scintilla's existing named constants. Note that some of the names of the + operations do not make perfect sense. This is a tradeoff in order to reuse + Scintilla's existing constants.

+ +

In the descriptions that follow, + SCI_PRIVATELEXERCALL(int operation, void *pointer) means you + would call Scintilla like + SendScintilla(sci, SCI_PRIVATELEXERCALL, operation, pointer);

+ +

Usage Example

+ +

The curses platform demo, jinx, has a C-source example for using the LPeg + lexer. Additionally, here is a pseudo-code example:

+ +

+    init_app() {
+      sci = scintilla_new()
+    }
+
+    create_doc() {
+      doc = SendScintilla(sci, SCI_CREATEDOCUMENT, 0, 0)
+      SendScintilla(sci, SCI_SETDOCPOINTER, 0, doc)
+      SendScintilla(sci, SCI_SETLEXERLANGUAGE, 0, "lpeg")
+      home = "/home/mitchell/app/lua_lexers"
+      SendScintilla(sci, SCI_SETPROPERTY, "lexer.lpeg.home", home)
+      SendScintilla(sci, SCI_SETPROPERTY, "lexer.lpeg.color.theme", "light")
+      fn = SendScintilla(sci, SCI_GETDIRECTFUNCTION, 0, 0)
+      SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_GETDIRECTFUNCTION, fn)
+      psci = SendScintilla(sci, SCI_GETDIRECTPOINTER, 0, 0)
+      SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_SETDOCPOINTER, psci)
+      SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_SETLEXERLANGUAGE, "lua")
+    }
+
+    set_lexer(lang) {
+      psci = SendScintilla(sci, SCI_GETDIRECTPOINTER, 0, 0)
+      SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_SETDOCPOINTER, psci)
+      SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_SETLEXERLANGUAGE, lang)
+    }
+    
+ + SCI_PRIVATELEXERCALL(SCI_CHANGELEXERSTATE, lua_State *L)
+ SCI_PRIVATELEXERCALL(SCI_GETDIRECTFUNCTION, int SciFnDirect)
+ SCI_PRIVATELEXERCALL(SCI_GETLEXERLANGUAGE, char *languageName) → int
+ SCI_PRIVATELEXERCALL(SCI_GETSTATUS, char *errorMessage) → int
+ SCI_PRIVATELEXERCALL(int styleNum, char *styleName) → int
+ SCI_PRIVATELEXERCALL(SCI_SETDOCPOINTER, int sci)
+ SCI_PRIVATELEXERCALL(SCI_SETLEXERLANGUAGE, languageName)
+
+ +

SCI_PRIVATELEXERCALL(SCI_CHANGELEXERSTATE, lua_State *L)
+ Tells the LPeg lexer to use L as its Lua state instead of + creating a separate state.

+ +

L must have already opened the "base", "string", "table", + "package", and "lpeg" libraries. If L is a Lua 5.1 state, it + must have also opened the "io" library.

+ +

The LPeg lexer will create a single lexer package (that can + be used with Lua's require function), as well as a number of + other variables in the LUA_REGISTRYINDEX table with the "sci_" + prefix.

+ +

Rather than including the path to Scintilla's Lua lexers in the + package.path of the given Lua state, set the "lexer.lpeg.home" + property instead. The LPeg lexer uses that property to find and load + lexers.

+ +

Usage:

+ +

+    lua = luaL_newstate()
+    SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_CHANGELEXERSTATE, lua)
+    
+ +

SCI_PRIVATELEXERCALL(SCI_GETDIRECTFUNCTION, SciFnDirect)
+ Tells the LPeg lexer the address of SciFnDirect, the function + that handles Scintilla messages.

+ +

Despite the name SCI_GETDIRECTFUNCTION, it only notifies the + LPeg lexer what the value of SciFnDirect obtained from + SCI_GETDIRECTFUNCTION + is. It does not return anything. Use this if you would like to have the LPeg + lexer set all Lua lexer styles automatically. This is useful for maintaining + a consistent color theme. Do not use this if your application maintains its + own color theme.

+ +

If you use this call, it must be made once for each + Scintilla document that was created using Scintilla's + SCI_CREATEDOCUMENT. + You must also use the + SCI_SETDOCPOINTER LPeg lexer + API call.

+ +

Usage:

+ +

+    fn = SendScintilla(sci, SCI_GETDIRECTFUNCTION, 0, 0)
+    SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_GETDIRECTFUNCTION, fn)
+    
+ +

See also: SCI_SETDOCPOINTER

+ +

SCI_PRIVATELEXERCALL(SCI_GETLEXERLANGUAGE, char *languageName) → int
+ Returns the length of the string name of the current Lua lexer or stores the + name into the given buffer. If the buffer is long enough, the name is + terminated by a 0 character.

+ +

For parent lexers with embedded children or child lexers embedded into + parents, the name is in "lexer/current" format, where "lexer" is the actual + lexer's name and "current" is the parent or child lexer at the current caret + position. In order for this to work, you must have called + SCI_GETDIRECTFUNCTION + and + SCI_SETDOCPOINTER.

+ +

SCI_PRIVATELEXERCALL(SCI_GETSTATUS, char *errorMessage) → int
+ Returns the length of the error message of the LPeg lexer or Lua lexer error + that occurred (if any), or stores the error message into the given buffer.

+ +

If no error occurred, the returned message will be empty.

+ +

Since the LPeg lexer does not throw errors as they occur, errors can only + be handled passively. Note that the LPeg lexer does print all errors to + stderr.

+ +

Usage:

+ +

+    SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_GETSTATUS, errmsg)
+    if (strlen(errmsg) > 0) { /* handle error */ }
+    
+ +

SCI_PRIVATELEXERCALL(int styleNum, char *styleName) → int
+ Returns the length of the token name associated with the given style number + or stores the style name into the given buffer. If the buffer is long + enough, the string is terminated by a 0 character.

+ +

Usage:

+ +

+    style = SendScintilla(sci, SCI_GETSTYLEAT, pos, 0)
+    SendScintilla(sci, SCI_PRIVATELEXERCALL, style, token)
+    // token now contains the name of the style at pos
+    
+ +

SCI_PRIVATELEXERCALL(SCI_SETDOCPOINTER, int sci)
+ Tells the LPeg lexer the address of the Scintilla window (obtained via + Scintilla's + SCI_GETDIRECTPOINTER) + currently in use.

+ +

Despite the name SCI_SETDOCPOINTER, it has no relationship + to Scintilla documents.

+ +

Use this call only if you are using the + SCI_GETDIRECTFUNCTION + LPeg lexer API call. It must be made before each call to + the SCI_SETLEXERLANGUAGE + LPeg lexer API call.

+ +

Usage:

+ +

+    SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_SETDOCPOINTER, sci)
+    
+ +

See also: SCI_GETDIRECTFUNCTION, + SCI_SETLEXERLANGUAGE

+ +

SCI_PRIVATELEXERCALL(SCI_SETLEXERLANGUAGE, const char *languageName)
+ Sets the current Lua lexer to languageName.

+ +

If you are having the LPeg lexer set the Lua lexer styles automatically, + make sure you call the + SCI_SETDOCPOINTER + LPeg lexer API first.

+ +

Usage:

+ +

+    SendScintilla(sci, SCI_PRIVATELEXERCALL, SCI_SETLEXERLANGUAGE, "lua")
+    
+ +

See also: SCI_SETDOCPOINTER

+ +

Writing Lua Lexers

+ +

Lexers highlight the syntax of source code. Scintilla (the editing component + behind Textadept) traditionally uses static, compiled C++ + lexers which are notoriously difficult to create and/or extend. On the other + hand, Lua makes it easy to to rapidly create new lexers, extend existing + ones, and embed lexers within one another. Lua lexers tend to be more + readable than C++ lexers too.

+ +

Lexers are Parsing Expression Grammars, or PEGs, composed with the Lua + LPeg library. The following table comes from the LPeg documentation and + summarizes all you need to know about constructing basic LPeg patterns. This + module provides convenience functions for creating and working with other + more advanced patterns and concepts.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Operator Description
lpeg.P(string) Matches string literally.
lpeg.P(n) Matches exactly n characters.
lpeg.S(string) Matches any character in set string.
lpeg.R("xy") Matches any character between range x and y.
patt^n Matches at least n repetitions of patt.
patt^-n Matches at most n repetitions of patt.
patt1 * patt2 Matches patt1 followed by patt2.
patt1 + patt2 Matches patt1 or patt2 (ordered choice).
patt1 - patt2 Matches patt1 if patt2 does not match.
-patt Equivalent to ("" - patt).
#patt Matches patt but consumes no input.
+ + +

The first part of this document deals with rapidly constructing a simple + lexer. The next part deals with more advanced techniques, such as custom + coloring and embedding lexers within one another. Following that is a + discussion about code folding, or being able to tell Scintilla which code + blocks are "foldable" (temporarily hideable from view). After that are + instructions on how to use Lua lexers with the aforementioned Textadept + editor. Finally there are comments on lexer performance and limitations.

+ +

+ +

Lexer Basics

+ +

The lexlua/ directory contains all lexers, including your new one. Before + attempting to write one from scratch though, first determine if your + programming language is similar to any of the 100+ languages supported. If + so, you may be able to copy and modify that lexer, saving some time and + effort. The filename of your lexer should be the name of your programming + language in lower case followed by a .lua extension. For example, a new Lua + lexer has the name lua.lua.

+ +

Note: Try to refrain from using one-character language names like "c", "d", + or "r". For example, Lua lexers for those languages are named "ansi_c", "dmd", and "rstats", + respectively.

+ +

+ +

New Lexer Template

+ +

There is a lexlua/template.txt file that contains a simple template for a + new lexer. Feel free to use it, replacing the '?'s with the name of your + lexer. Consider this snippet from the template:

+ +

+    -- ? LPeg lexer.
+
+    local lexer = require('lexer')
+    local token, word_match = lexer.token, lexer.word_match
+    local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+    local lex = lexer.new('?')
+
+    -- Whitespace.
+    local ws = token(lexer.WHITESPACE, lexer.space^1)
+    lex:add_rule('whitespace', ws)
+
+    [...]
+
+    return lex
+    
+ +

The first 3 lines of code simply define often used convenience variables. The + fourth and last lines define and return the lexer object + Scintilla uses; they are very important and must be part of every lexer. The + fifth line defines something called a "token", an essential building block of + lexers. You will learn about tokens shortly. The sixth line defines a lexer + grammar rule, which you will learn about later, as well as token styles. (Be + aware that it is common practice to combine these two lines for short rules.) + Note, however, the local prefix in front of variables, which is needed + so-as not to affect Lua's global environment. All in all, this is a minimal, + working lexer that you can build on.

+ +

+ +

Tokens

+ +

Take a moment to think about your programming language's structure. What kind + of key elements does it have? In the template shown earlier, one predefined + element all languages have is whitespace. Your language probably also has + elements like comments, strings, and keywords. Lexers refer to these elements + as "tokens". Tokens are the fundamental "building blocks" of lexers. Lexers + break down source code into tokens for coloring, which results in the syntax + highlighting familiar to you. It is up to you how specific your lexer is when + it comes to tokens. Perhaps only distinguishing between keywords and + identifiers is necessary, or maybe recognizing constants and built-in + functions, methods, or libraries is desirable. The Lua lexer, for example, + defines 11 tokens: whitespace, keywords, built-in functions, constants, + built-in libraries, identifiers, strings, comments, numbers, labels, and + operators. Even though constants, built-in functions, and built-in libraries + are subsets of identifiers, Lua programmers find it helpful for the lexer to + distinguish between them all. It is perfectly acceptable to just recognize + keywords and identifiers.

+ +

In a lexer, tokens consist of a token name and an LPeg pattern that matches a + sequence of characters recognized as an instance of that token. Create tokens + using the lexer.token() function. Let us examine the "whitespace" token + defined in the template shown earlier:

+ +

+    local ws = token(lexer.WHITESPACE, lexer.space^1)
+    
+ +

At first glance, the first argument does not appear to be a string name and + the second argument does not appear to be an LPeg pattern. Perhaps you + expected something like:

+ +

+    local ws = token('whitespace', S('\t\v\f\n\r ')^1)
+    
+ +

The lexer module actually provides a convenient list of common token names + and common LPeg patterns for you to use. Token names include + lexer.DEFAULT, lexer.WHITESPACE, lexer.COMMENT, + lexer.STRING, lexer.NUMBER, lexer.KEYWORD, + lexer.IDENTIFIER, lexer.OPERATOR, lexer.ERROR, + lexer.PREPROCESSOR, lexer.CONSTANT, lexer.VARIABLE, + lexer.FUNCTION, lexer.CLASS, lexer.TYPE, lexer.LABEL, + lexer.REGEX, and lexer.EMBEDDED. Patterns include + lexer.any, lexer.ascii, lexer.extend, lexer.alpha, + lexer.digit, lexer.alnum, lexer.lower, lexer.upper, + lexer.xdigit, lexer.cntrl, lexer.graph, lexer.print, + lexer.punct, lexer.space, lexer.newline, + lexer.nonnewline, lexer.nonnewline_esc, lexer.dec_num, + lexer.hex_num, lexer.oct_num, lexer.integer, + lexer.float, and lexer.word. You may use your own token names if + none of the above fit your language, but an advantage to using predefined + token names is that your lexer's tokens will inherit the universal syntax + highlighting color theme used by your text editor.

+ +

+ +
Example Tokens
+ +

So, how might you define other tokens like keywords, comments, and strings? + Here are some examples.

+ +

Keywords

+ +

Instead of matching n keywords with n P('keyword_n') ordered + choices, use another convenience function: lexer.word_match(). It is + much easier and more efficient to write word matches like:

+ +

+    local keyword = token(lexer.KEYWORD, lexer.word_match[[
+      keyword_1 keyword_2 ... keyword_n
+    ]])
+
+    local case_insensitive_keyword = token(lexer.KEYWORD, lexer.word_match([[
+      KEYWORD_1 keyword_2 ... KEYword_n
+    ]], true))
+
+    local hyphened_keyword = token(lexer.KEYWORD, lexer.word_match[[
+      keyword-1 keyword-2 ... keyword-n
+    ]])
+    
+ +

In order to more easily separate or categorize keyword sets, you can use Lua + line comments within keyword strings. Such comments will be ignored. For + example:

+ +

+    local keyword = token(lexer.KEYWORD, lexer.word_match[[
+      -- Version 1 keywords.
+      keyword_11, keyword_12 ... keyword_1n
+      -- Version 2 keywords.
+      keyword_21, keyword_22 ... keyword_2n
+      ...
+      -- Version N keywords.
+      keyword_m1, keyword_m2 ... keyword_mn
+    ]])
+    
+ +

Comments

+ +

Line-style comments with a prefix character(s) are easy to express with LPeg:

+ +

+    local shell_comment = token(lexer.COMMENT, '#' * lexer.nonnewline^0)
+    local c_line_comment = token(lexer.COMMENT,
+                                 '//' * lexer.nonnewline_esc^0)
+    
+ +

The comments above start with a '#' or "//" and go to the end of the line. + The second comment recognizes the next line also as a comment if the current + line ends with a '\' escape character.

+ +

C-style "block" comments with a start and end delimiter are also easy to + express:

+ +

+    local c_comment = token(lexer.COMMENT,
+                            '/*' * (lexer.any - '*/')^0 * P('*/')^-1)
+    
+ +

This comment starts with a "/*" sequence and contains anything up to and + including an ending "*/" sequence. The ending "*/" is optional so the lexer + can recognize unfinished comments as comments and highlight them properly.

+ +

Strings

+ +

It is tempting to think that a string is not much different from the block + comment shown above in that both have start and end delimiters:

+ +

+    local dq_str = '"' * (lexer.any - '"')^0 * P('"')^-1
+    local sq_str = "'" * (lexer.any - "'")^0 * P("'")^-1
+    local simple_string = token(lexer.STRING, dq_str + sq_str)
+    
+ +

However, most programming languages allow escape sequences in strings such + that a sequence like "\"" in a double-quoted string indicates that the + '"' is not the end of the string. The above token incorrectly matches + such a string. Instead, use the lexer.delimited_range() convenience + function.

+ +

+    local dq_str = lexer.delimited_range('"')
+    local sq_str = lexer.delimited_range("'")
+    local string = token(lexer.STRING, dq_str + sq_str)
+    
+ +

In this case, the lexer treats '\' as an escape character in a string + sequence.

+ +

Numbers

+ +

Most programming languages have the same format for integer and float tokens, + so it might be as simple as using a couple of predefined LPeg patterns:

+ +

+    local number = token(lexer.NUMBER, lexer.float + lexer.integer)
+    
+ +

However, some languages allow postfix characters on integers.

+ +

+    local integer = P('-')^-1 * (lexer.dec_num * S('lL')^-1)
+    local number = token(lexer.NUMBER, lexer.float + lexer.hex_num + integer)
+    
+ +

Your language may need other tweaks, but it is up to you how fine-grained you + want your highlighting to be. After all, you are not writing a compiler or + interpreter!

+ +

+ +

Rules

+ +

Programming languages have grammars, which specify valid token structure. For + example, comments usually cannot appear within a string. Grammars consist of + rules, which are simply combinations of tokens. Recall from the lexer + template the lexer.add_rule() call, which adds a rule to the lexer's + grammar:

+ +

+    lex:add_rule('whitespace', ws)
+    
+ +

Each rule has an associated name, but rule names are completely arbitrary and + serve only to identify and distinguish between different rules. Rule order is + important: if text does not match the first rule added to the grammar, the + lexer tries to match the second rule added, and so on. Right now this lexer + simply matches whitespace tokens under a rule named "whitespace".

+ +

To illustrate the importance of rule order, here is an example of a + simplified Lua lexer:

+ +

+    lex:add_rule('whitespace', token(lexer.WHITESPACE, ...))
+    lex:add_rule('keyword', token(lexer.KEYWORD, ...))
+    lex:add_rule('identifier', token(lexer.IDENTIFIER, ...))
+    lex:add_rule('string', token(lexer.STRING, ...))
+    lex:add_rule('comment', token(lexer.COMMENT, ...))
+    lex:add_rule('number', token(lexer.NUMBER, ...))
+    lex:add_rule('label', token(lexer.LABEL, ...))
+    lex:add_rule('operator', token(lexer.OPERATOR, ...))
+    
+ +

Note how identifiers come after keywords. In Lua, as with most programming + languages, the characters allowed in keywords and identifiers are in the same + set (alphanumerics plus underscores). If the lexer added the "identifier" + rule before the "keyword" rule, all keywords would match identifiers and thus + incorrectly highlight as identifiers instead of keywords. The same idea + applies to function, constant, etc. tokens that you may want to distinguish + between: their rules should come before identifiers.

+ +

So what about text that does not match any rules? For example in Lua, the '!' + character is meaningless outside a string or comment. Normally the lexer + skips over such text. If instead you want to highlight these "syntax errors", + add an additional end rule:

+ +

+    lex:add_rule('whitespace', ws)
+    ...
+    lex:add_rule('error', token(lexer.ERROR, lexer.any))
+    
+ +

This identifies and highlights any character not matched by an existing + rule as a lexer.ERROR token.

+ +

Even though the rules defined in the examples above contain a single token, + rules may consist of multiple tokens. For example, a rule for an HTML tag + could consist of a tag token followed by an arbitrary number of attribute + tokens, allowing the lexer to highlight all tokens separately. That rule + might look something like this:

+ +

+    lex:add_rule('tag', tag_start * (ws * attributes)^0 * tag_end^-1)
+    
+ +

Note however that lexers with complex rules like these are more prone to lose + track of their state, especially if they span multiple lines.

+ +

+ +

Summary

+ +

Lexers primarily consist of tokens and grammar rules. At your disposal are a + number of convenience patterns and functions for rapidly creating a lexer. If + you choose to use predefined token names for your tokens, you do not have to + define how the lexer highlights them. The tokens will inherit the default + syntax highlighting color theme your editor uses.

+ +

+ +

Advanced Techniques

+ +

+ +

Styles and Styling

+ +

The most basic form of syntax highlighting is assigning different colors to + different tokens. Instead of highlighting with just colors, Scintilla allows + for more rich highlighting, or "styling", with different fonts, font sizes, + font attributes, and foreground and background colors, just to name a few. + The unit of this rich highlighting is called a "style". Styles are simply + strings of comma-separated property settings. By default, lexers associate + predefined token names like lexer.WHITESPACE, lexer.COMMENT, + lexer.STRING, etc. with particular styles as part of a universal color + theme. These predefined styles include lexer.STYLE_CLASS, + lexer.STYLE_COMMENT, lexer.STYLE_CONSTANT, + lexer.STYLE_ERROR, lexer.STYLE_EMBEDDED, + lexer.STYLE_FUNCTION, lexer.STYLE_IDENTIFIER, + lexer.STYLE_KEYWORD, lexer.STYLE_LABEL, lexer.STYLE_NUMBER, + lexer.STYLE_OPERATOR, lexer.STYLE_PREPROCESSOR, + lexer.STYLE_REGEX, lexer.STYLE_STRING, lexer.STYLE_TYPE, + lexer.STYLE_VARIABLE, and lexer.STYLE_WHITESPACE. Like with + predefined token names and LPeg patterns, you may define your own styles. At + their core, styles are just strings, so you may create new ones and/or modify + existing ones. Each style consists of the following comma-separated settings:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Setting Description
font:name The name of the font the style uses.
size:int The size of the font the style uses.
[not]bold Whether or not the font face is bold.
weight:int The weight or boldness of a font, between 1 and 999.
[not]italics Whether or not the font face is italic.
[not]underlined Whether or not the font face is underlined.
fore:color The foreground color of the font face.
back:color The background color of the font face.
[not]eolfilled Does the background color extend to the end of the line?
case:char The case of the font ('u': upper, 'l': lower, 'm': normal).
[not]visible Whether or not the text is visible.
[not]changeable Whether the text is changeable or read-only.
+ + +

Specify font colors in either "#RRGGBB" format, "0xBBGGRR" format, or the + decimal equivalent of the latter. As with token names, LPeg patterns, and + styles, there is a set of predefined color names, but they vary depending on + the current color theme in use. Therefore, it is generally not a good idea to + manually define colors within styles in your lexer since they might not fit + into a user's chosen color theme. Try to refrain from even using predefined + colors in a style because that color may be theme-specific. Instead, the best + practice is to either use predefined styles or derive new color-agnostic + styles from predefined ones. For example, Lua "longstring" tokens use the + existing lexer.STYLE_STRING style instead of defining a new one.

+ +

+ +
Example Styles
+ +

Defining styles is pretty straightforward. An empty style that inherits the + default theme settings is simply an empty string:

+ +

+    local style_nothing = ''
+    
+ +

A similar style but with a bold font face looks like this:

+ +

+    local style_bold = 'bold'
+    
+ +

If you want the same style, but also with an italic font face, define the new + style in terms of the old one:

+ +

+    local style_bold_italic = style_bold..',italics'
+    
+ +

This allows you to derive new styles from predefined ones without having to + rewrite them. This operation leaves the old style unchanged. Thus if you + had a "static variable" token whose style you wanted to base off of + lexer.STYLE_VARIABLE, it would probably look like:

+ +

+    local style_static_var = lexer.STYLE_VARIABLE..',italics'
+    
+ +

The color theme files in the lexlua/themes/ folder give more examples of + style definitions.

+ +

+ +

Token Styles

+ +

Lexers use the lexer.add_style() function to assign styles to + particular tokens. Recall the token definition and from the lexer template:

+ +

+    local ws = token(lexer.WHITESPACE, lexer.space^1)
+    lex:add_rule('whitespace', ws)
+    
+ +

Why is a style not assigned to the lexer.WHITESPACE token? As mentioned + earlier, lexers automatically associate tokens that use predefined token + names with a particular style. Only tokens with custom token names need + manual style associations. As an example, consider a custom whitespace token:

+ +

+    local ws = token('custom_whitespace', lexer.space^1)
+    
+ +

Assigning a style to this token looks like:

+ +

+    lex:add_style('custom_whitespace', lexer.STYLE_WHITESPACE)
+    
+ +

Do not confuse token names with rule names. They are completely different + entities. In the example above, the lexer associates the "custom_whitespace" + token with the existing style for lexer.WHITESPACE tokens. If instead you + prefer to color the background of whitespace a shade of grey, it might look + like:

+ +

+    local custom_style = lexer.STYLE_WHITESPACE..',back:$(color.grey)'
+    lex:add_style('custom_whitespace', custom_style)
+    
+ +

Notice that the lexer peforms Scintilla-style "$()" property expansion. + You may also use "%()". Remember to refrain from assigning specific colors in + styles, but in this case, all user color themes probably define the + "color.grey" property.

+ +

+ +

Line Lexers

+ +

By default, lexers match the arbitrary chunks of text passed to them by + Scintilla. These chunks may be a full document, only the visible part of a + document, or even just portions of lines. Some lexers need to match whole + lines. For example, a lexer for the output of a file "diff" needs to know if + the line started with a '+' or '-' and then style the entire line + accordingly. To indicate that your lexer matches by line, create the lexer + with an extra parameter:

+ +

+    local lex = lexer.new('?', {lex_by_line = true})
+    
+ +

Now the input text for the lexer is a single line at a time. Keep in mind + that line lexers do not have the ability to look ahead at subsequent lines.

+ +

+ +

Embedded Lexers

+ +

Lexers embed within one another very easily, requiring minimal effort. In the + following sections, the lexer being embedded is called the "child" lexer and + the lexer a child is being embedded in is called the "parent". For example, + consider an HTML lexer and a CSS lexer. Either lexer stands alone for styling + their respective HTML and CSS files. However, CSS can be embedded inside + HTML. In this specific case, the CSS lexer is the "child" lexer with the HTML + lexer being the "parent". Now consider an HTML lexer and a PHP lexer. This + sounds a lot like the case with CSS, but there is a subtle difference: PHP + embeds itself into HTML while CSS is embedded in HTML. This fundamental + difference results in two types of embedded lexers: a parent lexer that + embeds other child lexers in it (like HTML embedding CSS), and a child lexer + that embeds itself into a parent lexer (like PHP embedding itself in HTML).

+ +

+ +
Parent Lexer
+ +

Before embedding a child lexer into a parent lexer, the parent lexer needs to + load the child lexer. This is done with the lexer.load() function. For + example, loading the CSS lexer within the HTML lexer looks like:

+ +

+    local css = lexer.load('css')
+    
+ +

The next part of the embedding process is telling the parent lexer when to + switch over to the child lexer and when to switch back. The lexer refers to + these indications as the "start rule" and "end rule", respectively, and are + just LPeg patterns. Continuing with the HTML/CSS example, the transition from + HTML to CSS is when the lexer encounters a "style" tag with a "type" + attribute whose value is "text/css":

+ +

+    local css_tag = P('<style') * P(function(input, index)
+      if input:find('^[^>]+type="text/css"', index) then
+        return index
+      end
+    end)
+    
+ +

This pattern looks for the beginning of a "style" tag and searches its + attribute list for the text "type="text/css"". (In this simplified example, + the Lua pattern does not consider whitespace between the '=' nor does it + consider that using single quotes is valid.) If there is a match, the + functional pattern returns a value instead of nil. In this case, the value + returned does not matter because we ultimately want to style the "style" tag + as an HTML tag, so the actual start rule looks like this:

+ +

+    local css_start_rule = #css_tag * tag
+    
+ +

Now that the parent knows when to switch to the child, it needs to know when + to switch back. In the case of HTML/CSS, the switch back occurs when the + lexer encounters an ending "style" tag, though the lexer should still style + the tag as an HTML tag:

+ +

+    local css_end_rule = #P('</style>') * tag
+    
+ +

Once the parent loads the child lexer and defines the child's start and end + rules, it embeds the child with the lexer.embed() function:

+ +

+    lex:embed(css, css_start_rule, css_end_rule)
+    
+ +

+ +
Child Lexer
+ +

The process for instructing a child lexer to embed itself into a parent is + very similar to embedding a child into a parent: first, load the parent lexer + into the child lexer with the lexer.load() function and then create + start and end rules for the child lexer. However, in this case, call + lexer.embed() with switched arguments. For example, in the PHP lexer:

+ +

+    local html = lexer.load('html')
+    local php_start_rule = token('php_tag', '<?php ')
+    local php_end_rule = token('php_tag', '?>')
+    lex:add_style('php_tag', lexer.STYLE_EMBEDDED)
+    html:embed(lex, php_start_rule, php_end_rule)
+    
+ +

+ +

Lexers with Complex State

+ +

A vast majority of lexers are not stateful and can operate on any chunk of + text in a document. However, there may be rare cases where a lexer does need + to keep track of some sort of persistent state. Rather than using lpeg.P + function patterns that set state variables, it is recommended to make use of + Scintilla's built-in, per-line state integers via lexer.line_state. It + was designed to accommodate up to 32 bit flags for tracking state. + lexer.line_from_position() will return the line for any position given + to an lpeg.P function pattern. (Any positions derived from that position + argument will also work.)

+ +

Writing stateful lexers is beyond the scope of this document.

+ +

+ +

Code Folding

+ +

When reading source code, it is occasionally helpful to temporarily hide + blocks of code like functions, classes, comments, etc. This is the concept of + "folding". In many Scintilla-based editors, such as Textadept, little indicators + in the editor margins appear next to code that can be folded at places called + "fold points". When the user clicks an indicator, the editor hides the code + associated with the indicator until the user clicks the indicator again. The + lexer specifies these fold points and what code exactly to fold.

+ +

The fold points for most languages occur on keywords or character sequences. + Examples of fold keywords are "if" and "end" in Lua and examples of fold + character sequences are '{', '}', "/*", and "*/" in C for code block and + comment delimiters, respectively. However, these fold points cannot occur + just anywhere. For example, lexers should not recognize fold keywords that + appear within strings or comments. The lexer.add_fold_point() function + allows you to conveniently define fold points with such granularity. For + example, consider C:

+ +

+    lex:add_fold_point(lexer.OPERATOR, '{', '}')
+    lex:add_fold_point(lexer.COMMENT, '/*', '*/')
+    
+ +

The first assignment states that any '{' or '}' that the lexer recognized as + an lexer.OPERATOR token is a fold point. Likewise, the second assignment + states that any "/*" or "*/" that the lexer recognizes as part of a + lexer.COMMENT token is a fold point. The lexer does not consider any + occurrences of these characters outside their defined tokens (such as in a + string) as fold points. How do you specify fold keywords? Here is an example + for Lua:

+ +

+    lex:add_fold_point(lexer.KEYWORD, 'if', 'end')
+    lex:add_fold_point(lexer.KEYWORD, 'do', 'end')
+    lex:add_fold_point(lexer.KEYWORD, 'function', 'end')
+    lex:add_fold_point(lexer.KEYWORD, 'repeat', 'until')
+    
+ +

If your lexer has case-insensitive keywords as fold points, simply add a + case_insensitive_fold_points = true option to lexer.new(), and + specify keywords in lower case.

+ +

If your lexer needs to do some additional processing in order to determine if + a token is a fold point, pass a function that returns an integer to + lex:add_fold_point(). Returning 1 indicates the token is a beginning fold + point and returning -1 indicates the token is an ending fold point. + Returning 0 indicates the token is not a fold point. For example:

+ +

+    local function fold_strange_token(text, pos, line, s, symbol)
+      if ... then
+        return 1 -- beginning fold point
+      elseif ... then
+        return -1 -- ending fold point
+      end
+      return 0
+    end
+
+    lex:add_fold_point('strange_token', '|', fold_strange_token)
+    
+ +

Any time the lexer encounters a '|' that is a "strange_token", it calls the + fold_strange_token function to determine if '|' is a fold point. The lexer + calls these functions with the following arguments: the text to identify fold + points in, the beginning position of the current line in the text to fold, + the current line's text, the position in the current line the fold point text + starts at, and the fold point text itself.

+ +

+ +

Fold by Indentation

+ +

Some languages have significant whitespace and/or no delimiters that indicate + fold points. If your lexer falls into this category and you would like to + mark fold points based on changes in indentation, create the lexer with a + fold_by_indentation = true option:

+ +

+    local lex = lexer.new('?', {fold_by_indentation = true})
+    
+ +

+ +

Using Lexers

+ +

+ +

Textadept

+ +

Put your lexer in your ~/.textadept/lexers/ directory so you do not + overwrite it when upgrading Textadept. Also, lexers in this directory + override default lexers. Thus, Textadept loads a user lua lexer instead of + the default lua lexer. This is convenient for tweaking a default lexer to + your liking. Then add a file type for your lexer if necessary.

+ +

+ +

Migrating Legacy Lexers

+ +

Legacy lexers are of the form:

+ +

+    local l = require('lexer')
+    local token, word_match = l.token, l.word_match
+    local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+    local M = {_NAME = '?'}
+
+    [... token and pattern definitions ...]
+
+    M._rules = {
+      {'rule', pattern},
+      [...]
+    }
+
+    M._tokenstyles = {
+      'token' = 'style',
+      [...]
+    }
+
+    M._foldsymbols = {
+      _patterns = {...},
+      ['token'] = {['start'] = 1, ['end'] = -1},
+      [...]
+    }
+
+    return M
+    
+ +

While such legacy lexers will be handled just fine without any + changes, it is recommended that you migrate yours. The migration process is + fairly straightforward:

+ +
    +
  1. Replace all instances of l with lexer, as it's better practice and + results in less confusion.
  2. +
  3. Replace local M = {_NAME = '?'} with local lex = lexer.new('?'), where + ? is the name of your legacy lexer. At the end of the lexer, change + return M to return lex.
  4. +
  5. Instead of defining rules towards the end of your lexer, define your rules + as you define your tokens and patterns using + lex:add_rule().
  6. +
  7. Similarly, any custom token names should have their styles immediately + defined using lex:add_style().
  8. +
  9. Convert any table arguments passed to lexer.word_match() to a + space-separated string of words.
  10. +
  11. Replace any calls to lexer.embed(M, child, ...) and + lexer.embed(parent, M, ...) with + lex:embed(child, ...) and parent:embed(lex, ...), + respectively.
  12. +
  13. Define fold points with simple calls to + lex:add_fold_point(). No need to mess with Lua + patterns anymore.
  14. +
  15. Any legacy lexer options such as M._FOLDBYINDENTATION, M._LEXBYLINE, + M._lexer, etc. should be added as table options to lexer.new().
  16. +
  17. Any external lexer rule fetching and/or modifications via lexer._RULES + should be changed to use lexer.get_rule() and + lexer.modify_rule().
  18. +
+ + +

As an example, consider the following sample legacy lexer:

+ +

+    local l = require('lexer')
+    local token, word_match = l.token, l.word_match
+    local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+    local M = {_NAME = 'legacy'}
+
+    local ws = token(l.WHITESPACE, l.space^1)
+    local comment = token(l.COMMENT, '#' * l.nonnewline^0)
+    local string = token(l.STRING, l.delimited_range('"'))
+    local number = token(l.NUMBER, l.float + l.integer)
+    local keyword = token(l.KEYWORD, word_match{'foo', 'bar', 'baz'})
+    local custom = token('custom', P('quux'))
+    local identifier = token(l.IDENTIFIER, l.word)
+    local operator = token(l.OPERATOR, S('+-*/%^=<>,.()[]{}'))
+
+    M._rules = {
+      {'whitespace', ws},
+      {'keyword', keyword},
+      {'custom', custom},
+      {'identifier', identifier},
+      {'string', string},
+      {'comment', comment},
+      {'number', number},
+      {'operator', operator}
+    }
+
+    M._tokenstyles = {
+      'custom' = l.STYLE_KEYWORD..',bold'
+    }
+
+    M._foldsymbols = {
+      _patterns = {'[{}]'},
+      [l.OPERATOR] = {['{'] = 1, ['}'] = -1}
+    }
+
+    return M
+    
+ +

Following the migration steps would yield:

+ +

+    local lexer = require('lexer')
+    local token, word_match = lexer.token, lexer.word_match
+    local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+    local lex = lexer.new('legacy')
+
+    lex:add_rule('whitespace', token(lexer.WHITESPACE, lexer.space^1))
+    lex:add_rule('keyword', token(lexer.KEYWORD, word_match[[foo bar baz]]))
+    lex:add_rule('custom', token('custom', P('quux')))
+    lex:add_style('custom', lexer.STYLE_KEYWORD..',bold')
+    lex:add_rule('identifier', token(lexer.IDENTIFIER, lexer.word))
+    lex:add_rule('string', token(lexer.STRING, lexer.delimited_range('"')))
+    lex:add_rule('comment', token(lexer.COMMENT, '#' * lexer.nonnewline^0))
+    lex:add_rule('number', token(lexer.NUMBER, lexer.float + lexer.integer))
+    lex:add_rule('operator', token(lexer.OPERATOR, S('+-*/%^=<>,.()[]{}')))
+
+    lex:add_fold_point(lexer.OPERATOR, '{', '}')
+
+    return lex
+    
+ +

+ +

Considerations

+ +

+ +

Performance

+ +

There might be some slight overhead when initializing a lexer, but loading a + file from disk into Scintilla is usually more expensive. On modern computer + systems, I see no difference in speed between Lua lexers and Scintilla's C++ + ones. Optimize lexers for speed by re-arranging lexer.add_rule() calls so + that the most common rules match first. Do keep in mind that order matters + for similar rules.

+ +

In some cases, folding may be far more expensive than lexing, particularly + in lexers with a lot of potential fold points. If your lexer is exhibiting + signs of slowness, try disabling folding your text editor first. If that + speeds things up, you can try reducing the number of fold points you added, + overriding lexer.fold() with your own implementation, or simply eliminating + folding support from your lexer.

+ +

+ +

Limitations

+ +

Embedded preprocessor languages like PHP cannot completely embed in their + parent languages in that the parent's tokens do not support start and end + rules. This mostly goes unnoticed, but code like

+ +

+    <div id="<?php echo $id; ?>">
+    
+ +

will not style correctly.

+ +

+ +

Troubleshooting

+ +

Errors in lexers can be tricky to debug. Lexers print Lua errors to + io.stderr and _G.print() statements to io.stdout. Running your editor + from a terminal is the easiest way to see errors as they occur.

+ +

+ +

Risks

+ +

Poorly written lexers have the ability to crash Scintilla (and thus its + containing application), so unsaved data might be lost. However, I have only + observed these crashes in early lexer development, when syntax errors or + pattern errors are present. Once the lexer actually starts styling text + (either correctly or incorrectly, it does not matter), I have not observed + any crashes.

+ +

+ +

Acknowledgements

+ +

Thanks to Peter Odding for his lexer post on the Lua mailing list + that inspired me, and thanks to Roberto Ierusalimschy for LPeg.

+ +

Lua lexer module API fields

+ +

+ +

lexer.CLASS (string)

+ +

The token name for class tokens.

+ +

+ +

lexer.COMMENT (string)

+ +

The token name for comment tokens.

+ +

+ +

lexer.CONSTANT (string)

+ +

The token name for constant tokens.

+ +

+ +

lexer.DEFAULT (string)

+ +

The token name for default tokens.

+ +

+ +

lexer.ERROR (string)

+ +

The token name for error tokens.

+ +

+ +

lexer.FOLD_BASE (number)

+ +

The initial (root) fold level.

+ +

+ +

lexer.FOLD_BLANK (number)

+ +

Flag indicating that the line is blank.

+ +

+ +

lexer.FOLD_HEADER (number)

+ +

Flag indicating the line is fold point.

+ +

+ +

lexer.FUNCTION (string)

+ +

The token name for function tokens.

+ +

+ +

lexer.IDENTIFIER (string)

+ +

The token name for identifier tokens.

+ +

+ +

lexer.KEYWORD (string)

+ +

The token name for keyword tokens.

+ +

+ +

lexer.LABEL (string)

+ +

The token name for label tokens.

+ +

+ +

lexer.NUMBER (string)

+ +

The token name for number tokens.

+ +

+ +

lexer.OPERATOR (string)

+ +

The token name for operator tokens.

+ +

+ +

lexer.PREPROCESSOR (string)

+ +

The token name for preprocessor tokens.

+ +

+ +

lexer.REGEX (string)

+ +

The token name for regex tokens.

+ +

+ +

lexer.STRING (string)

+ +

The token name for string tokens.

+ +

+ +

lexer.STYLE_BRACEBAD (string)

+ +

The style used for unmatched brace characters.

+ +

+ +

lexer.STYLE_BRACELIGHT (string)

+ +

The style used for highlighted brace characters.

+ +

+ +

lexer.STYLE_CALLTIP (string)

+ +

The style used by call tips if buffer.call_tip_use_style is set. + Only the font name, size, and color attributes are used.

+ +

+ +

lexer.STYLE_CLASS (string)

+ +

The style typically used for class definitions.

+ +

+ +

lexer.STYLE_COMMENT (string)

+ +

The style typically used for code comments.

+ +

+ +

lexer.STYLE_CONSTANT (string)

+ +

The style typically used for constants.

+ +

+ +

lexer.STYLE_CONTROLCHAR (string)

+ +

The style used for control characters. + Color attributes are ignored.

+ +

+ +

lexer.STYLE_DEFAULT (string)

+ +

The style all styles are based off of.

+ +

+ +

lexer.STYLE_EMBEDDED (string)

+ +

The style typically used for embedded code.

+ +

+ +

lexer.STYLE_ERROR (string)

+ +

The style typically used for erroneous syntax.

+ +

+ +

lexer.STYLE_FOLDDISPLAYTEXT (string)

+ +

The style used for fold display text.

+ +

+ +

lexer.STYLE_FUNCTION (string)

+ +

The style typically used for function definitions.

+ +

+ +

lexer.STYLE_IDENTIFIER (string)

+ +

The style typically used for identifier words.

+ +

+ +

lexer.STYLE_INDENTGUIDE (string)

+ +

The style used for indentation guides.

+ +

+ +

lexer.STYLE_KEYWORD (string)

+ +

The style typically used for language keywords.

+ +

+ +

lexer.STYLE_LABEL (string)

+ +

The style typically used for labels.

+ +

+ +

lexer.STYLE_LINENUMBER (string)

+ +

The style used for all margins except fold margins.

+ +

+ +

lexer.STYLE_NUMBER (string)

+ +

The style typically used for numbers.

+ +

+ +

lexer.STYLE_OPERATOR (string)

+ +

The style typically used for operators.

+ +

+ +

lexer.STYLE_PREPROCESSOR (string)

+ +

The style typically used for preprocessor statements.

+ +

+ +

lexer.STYLE_REGEX (string)

+ +

The style typically used for regular expression strings.

+ +

+ +

lexer.STYLE_STRING (string)

+ +

The style typically used for strings.

+ +

+ +

lexer.STYLE_TYPE (string)

+ +

The style typically used for static types.

+ +

+ +

lexer.STYLE_VARIABLE (string)

+ +

The style typically used for variables.

+ +

+ +

lexer.STYLE_WHITESPACE (string)

+ +

The style typically used for whitespace.

+ +

+ +

lexer.TYPE (string)

+ +

The token name for type tokens.

+ +

+ +

lexer.VARIABLE (string)

+ +

The token name for variable tokens.

+ +

+ +

lexer.WHITESPACE (string)

+ +

The token name for whitespace tokens.

+ +

+ +

lexer.alnum (pattern)

+ +

A pattern that matches any alphanumeric character ('A'-'Z', 'a'-'z', + '0'-'9').

+ +

+ +

lexer.alpha (pattern)

+ +

A pattern that matches any alphabetic character ('A'-'Z', 'a'-'z').

+ +

+ +

lexer.any (pattern)

+ +

A pattern that matches any single character.

+ +

+ +

lexer.ascii (pattern)

+ +

A pattern that matches any ASCII character (codes 0 to 127).

+ +

+ +

lexer.cntrl (pattern)

+ +

A pattern that matches any control character (ASCII codes 0 to 31).

+ +

+ +

lexer.dec_num (pattern)

+ +

A pattern that matches a decimal number.

+ +

+ +

lexer.digit (pattern)

+ +

A pattern that matches any digit ('0'-'9').

+ +

+ +

lexer.extend (pattern)

+ +

A pattern that matches any ASCII extended character (codes 0 to 255).

+ +

+ +

lexer.float (pattern)

+ +

A pattern that matches a floating point number.

+ +

+ +

lexer.fold_level (table, Read-only)

+ +

Table of fold level bit-masks for line numbers starting from zero. + Fold level masks are composed of an integer level combined with any of the + following bits:

+ + + + +

+ +

lexer.graph (pattern)

+ +

A pattern that matches any graphical character ('!' to '~').

+ +

+ +

lexer.hex_num (pattern)

+ +

A pattern that matches a hexadecimal number.

+ +

+ +

lexer.indent_amount (table, Read-only)

+ +

Table of indentation amounts in character columns, for line numbers + starting from zero.

+ +

+ +

lexer.integer (pattern)

+ +

A pattern that matches either a decimal, hexadecimal, or octal number.

+ +

+ +

lexer.line_state (table)

+ +

Table of integer line states for line numbers starting from zero. + Line states can be used by lexers for keeping track of persistent states.

+ +

+ +

lexer.lower (pattern)

+ +

A pattern that matches any lower case character ('a'-'z').

+ +

+ +

lexer.newline (pattern)

+ +

A pattern that matches any set of end of line characters.

+ +

+ +

lexer.nonnewline (pattern)

+ +

A pattern that matches any single, non-newline character.

+ +

+ +

lexer.nonnewline_esc (pattern)

+ +

A pattern that matches any single, non-newline character or any set of end + of line characters escaped with '\'.

+ +

+ +

lexer.oct_num (pattern)

+ +

A pattern that matches an octal number.

+ +

+ +

lexer.path (string)

+ +

The path used to search for a lexer to load. + Identical in format to Lua's package.path string. + The default value is package.path.

+ +

+ +

lexer.print (pattern)

+ +

A pattern that matches any printable character (' ' to '~').

+ +

+ +

lexer.property (table)

+ +

Map of key-value string pairs.

+ +

+ +

lexer.property_expanded (table, Read-only)

+ +

Map of key-value string pairs with $() and %() variable replacement + performed in values.

+ +

+ +

lexer.property_int (table, Read-only)

+ +

Map of key-value pairs with values interpreted as numbers, or 0 if not + found.

+ +

+ +

lexer.punct (pattern)

+ +

A pattern that matches any punctuation character ('!' to '/', ':' to '@', + '[' to ''', '{' to '~').

+ +

+ +

lexer.space (pattern)

+ +

A pattern that matches any whitespace character ('\t', '\v', '\f', '\n', + '\r', space).

+ +

+ +

lexer.style_at (table, Read-only)

+ +

Table of style names at positions in the buffer starting from 1.

+ +

+ +

lexer.upper (pattern)

+ +

A pattern that matches any upper case character ('A'-'Z').

+ +

+ +

lexer.word (pattern)

+ +

A pattern that matches a typical word. Words begin with a letter or + underscore and consist of alphanumeric and underscore characters.

+ +

+ +

lexer.xdigit (pattern)

+ +

A pattern that matches any hexadecimal digit ('0'-'9', 'A'-'F', 'a'-'f').

+ +

Lua lexer module API functions

+ +

+ +

lexer.add_fold_point (lexer, token_name, start_symbol, end_symbol)

+ +

Adds to lexer lexer a fold point whose beginning and end tokens are string + token_name tokens with string content start_symbol and end_symbol, + respectively. + In the event that start_symbol may or may not be a fold point depending on + context, and that additional processing is required, end_symbol may be a + function that ultimately returns 1 (indicating a beginning fold point), + -1 (indicating an ending fold point), or 0 (indicating no fold point). + That function is passed the following arguments:

+ + + + +

Fields:

+ + + + +

Usage:

+ + + + +

+ +

lexer.add_rule (lexer, id, rule)

+ +

Adds pattern rule identified by string id to the ordered list of rules + for lexer lexer.

+ +

Fields:

+ + + + +

See also:

+ + + + +

+ +

lexer.add_style (lexer, token_name, style)

+ +

Associates string token_name in lexer lexer with Scintilla style string + style. + Style strings are comma-separated property settings. Available property + settings are:

+ + + + +

Property settings may also contain "$(property.name)" expansions for + properties defined in Scintilla, theme files, etc.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

+ +

lexer.delimited_range (chars, single_line, no_escape, balanced)

+ +

Creates and returns a pattern that matches a range of text bounded by + chars characters. + This is a convenience function for matching more complicated delimited ranges + like strings with escape characters and balanced parentheses. single_line + indicates whether or not the range must be on a single line, no_escape + indicates whether or not to ignore '\' as an escape character, and balanced + indicates whether or not to handle balanced ranges like parentheses and + requires chars to be composed of two characters.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

Return:

+ + + + +

See also:

+ + + + +

+ +

lexer.embed (lexer, child, start_rule, end_rule)

+ +

Embeds child lexer child in parent lexer lexer using patterns + start_rule and end_rule, which signal the beginning and end of the + embedded lexer, respectively.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

+ +

lexer.fold (lexer, text, start_pos, start_line, start_level)

+ +

Determines fold points in a chunk of text text using lexer lexer, + returning a table of fold levels associated with line numbers. + text starts at position start_pos on line number start_line with a + beginning fold level of start_level in the buffer.

+ +

Fields:

+ + + + +

Return:

+ + + + +

+ +

lexer.fold_line_comments (prefix)

+ +

Returns a fold function (to be passed to lexer.add_fold_point()) that folds + consecutive line comments that start with string prefix.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

+ +

lexer.get_rule (lexer, id)

+ +

Returns the rule identified by string id.

+ +

Fields:

+ + + + +

Return:

+ + + + +

+ +

lexer.last_char_includes (s)

+ +

Creates and returns a pattern that verifies that string set s contains the + first non-whitespace character behind the current match position.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

Return:

+ + + + +

+ +

lexer.lex (lexer, text, init_style)

+ +

Lexes a chunk of text text (that has an initial style number of + init_style) using lexer lexer, returning a table of token names and + positions.

+ +

Fields:

+ + + + +

Return:

+ + + + +

+ +

lexer.line_from_position (pos)

+ +

Returns the line number of the line that contains position pos, which + starts from 1.

+ +

Fields:

+ + + + +

Return:

+ + + + +

+ +

lexer.load (name, alt_name, cache)

+ +

Initializes or loads and returns the lexer of string name name. + Scintilla calls this function in order to load a lexer. Parent lexers also + call this function in order to load child lexers and vice-versa. The user + calls this function in order to load a lexer when using this module as a Lua + library.

+ +

Fields:

+ + + + +

Return:

+ + + + +

+ +

lexer.modify_rule (lexer, id, rule)

+ +

Replaces in lexer lexer the existing rule identified by string id with + pattern rule.

+ +

Fields:

+ + + + +

+ +

lexer.nested_pair (start_chars, end_chars)

+ +

Returns a pattern that matches a balanced range of text that starts with + string start_chars and ends with string end_chars. + With single-character delimiters, this function is identical to + delimited_range(start_chars..end_chars, false, true, true).

+ +

Fields:

+ + + + +

Usage:

+ + + + +

Return:

+ + + + +

See also:

+ + + + +

+ +

lexer.new (name, opts)

+ +

Creates a returns a new lexer with the given name.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

+ +

lexer.starts_line (patt)

+ +

Creates and returns a pattern that matches pattern patt only at the + beginning of a line.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

Return:

+ + + + +

+ +

lexer.token (name, patt)

+ +

Creates and returns a token pattern with token name name and pattern + patt. + If name is not a predefined token name, its style must be defined via + lexer.add_style().

+ +

Fields:

+ + + + +

Usage:

+ + + + +

Return:

+ + + + +

+ +

lexer.word_match (words, case_insensitive, word_chars)

+ +

Creates and returns a pattern that matches any single word in string words. + case_insensitive indicates whether or not to ignore case when matching + words. + This is a convenience function for simplifying a set of ordered choice word + patterns. + If words is a multi-line string, it may contain Lua line comments (--) + that will ultimately be ignored.

+ +

Fields:

+ + + + +

Usage:

+ + + + +

Return:

+ + + +

Supported Languages

+ +

Scintilla has Lua lexers for all of the languages below. Languages + denoted by a * have native + folders. For languages without + native folding support, folding based on indentation can be used if + fold.by.indentation is enabled.

+ +
    +
  1. Actionscript*
  2. +
  3. Ada
  4. +
  5. ANTLR*
  6. +
  7. APDL*
  8. +
  9. APL
  10. +
  11. Applescript
  12. +
  13. ASM* (NASM)
  14. +
  15. ASP*
  16. +
  17. AutoIt
  18. +
  19. AWK*
  20. +
  21. Batch*
  22. +
  23. BibTeX*
  24. +
  25. Boo
  26. +
  27. C*
  28. +
  29. C++*
  30. +
  31. C#*
  32. +
  33. ChucK
  34. +
  35. CMake*
  36. +
  37. Coffeescript
  38. +
  39. ConTeXt*
  40. +
  41. CSS*
  42. +
  43. CUDA*
  44. +
  45. D*
  46. +
  47. Dart*
  48. +
  49. Desktop Entry
  50. +
  51. Diff
  52. +
  53. Django*
  54. +
  55. Dockerfile
  56. +
  57. Dot*
  58. +
  59. Eiffel*
  60. +
  61. Elixir
  62. +
  63. Erlang*
  64. +
  65. F#
  66. +
  67. Faust
  68. +
  69. Fish*
  70. +
  71. Forth
  72. +
  73. Fortran
  74. +
  75. GAP*
  76. +
  77. gettext
  78. +
  79. Gherkin
  80. +
  81. GLSL*
  82. +
  83. Gnuplot
  84. +
  85. Go*
  86. +
  87. Groovy*
  88. +
  89. Gtkrc*
  90. +
  91. Haskell
  92. +
  93. HTML*
  94. +
  95. Icon*
  96. +
  97. IDL
  98. +
  99. Inform
  100. +
  101. ini
  102. +
  103. Io*
  104. +
  105. Java*
  106. +
  107. Javascript*
  108. +
  109. JSON*
  110. +
  111. JSP*
  112. +
  113. LaTeX*
  114. +
  115. Ledger
  116. +
  117. LESS*
  118. +
  119. LilyPond
  120. +
  121. Lisp*
  122. +
  123. Literate Coffeescript
  124. +
  125. Logtalk
  126. +
  127. Lua*
  128. +
  129. Makefile
  130. +
  131. Man Page
  132. +
  133. Markdown
  134. +
  135. MATLAB*
  136. +
  137. MoonScript
  138. +
  139. Myrddin
  140. +
  141. Nemerle*
  142. +
  143. Nim
  144. +
  145. NSIS
  146. +
  147. Objective-C*
  148. +
  149. OCaml
  150. +
  151. Pascal
  152. +
  153. Perl*
  154. +
  155. PHP*
  156. +
  157. PICO-8*
  158. +
  159. Pike*
  160. +
  161. PKGBUILD*
  162. +
  163. Postscript
  164. +
  165. PowerShell*
  166. +
  167. Prolog
  168. +
  169. Properties
  170. +
  171. Pure
  172. +
  173. Python
  174. +
  175. R
  176. +
  177. rc*
  178. +
  179. REBOL*
  180. +
  181. Rexx*
  182. +
  183. ReStructuredText*
  184. +
  185. RHTML*
  186. +
  187. Ruby*
  188. +
  189. Ruby on Rails*
  190. +
  191. Rust*
  192. +
  193. Sass*
  194. +
  195. Scala*
  196. +
  197. Scheme*
  198. +
  199. Shell*
  200. +
  201. Smalltalk*
  202. +
  203. Standard ML
  204. +
  205. SNOBOL4
  206. +
  207. SQL
  208. +
  209. TaskPaper
  210. +
  211. Tcl*
  212. +
  213. TeX*
  214. +
  215. Texinfo*
  216. +
  217. TOML
  218. +
  219. Vala*
  220. +
  221. VBScript
  222. +
  223. vCard*
  224. +
  225. Verilog*
  226. +
  227. VHDL
  228. +
  229. Visual Basic
  230. +
  231. Windows Script File*
  232. +
  233. XML*
  234. +
  235. Xtend*
  236. +
  237. YAML
  238. +
+ +

Code Contributors

+ + + + + -- cgit v1.2.3