diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/Lexer.txt | 220 | 
1 files changed, 220 insertions, 0 deletions
| diff --git a/doc/Lexer.txt b/doc/Lexer.txt new file mode 100644 index 000000000..247eb1beb --- /dev/null +++ b/doc/Lexer.txt @@ -0,0 +1,220 @@ +How to write a scintilla lexer + +A lexer for a particular language determines how a specified range of +text shall be colored.  Writing a lexer is relatively straightforward +because the lexer need only color given text.  The harder job of +determining how much text actually needs to be colored is handled by +Scintilla itself, that is, the lexer's caller. + + +Parameters + +The lexer for language LLL has the following prototype: + +    static void ColouriseLLLDoc ( +        unsigned int startPos, int length, +        int initStyle, +        WordList *keywordlists[], +        Accessor &styler); + +The styler parameter is an Accessor object.  The lexer must use this +object to access the text to be colored.  The lexer gets the character +at position i using styler.SafeGetCharAt(i); + +The startPos and length parameters indicate the range of text to be +recolored; the lexer must determine the proper color for all characters +in positions startPos through startPos+length. + +The initStyle paramter indicates the initial state, that is, the state +at the character before startPos. States also indicate the coloring to +be used for a particular range of text. + +Note:  the character at StartPos is assumed to start a line, so if a +newline terminates the initStyle state the lexer should enter its +default state (or whatever state should follow initStyle). + +The keywordlists parameter specifies the keywords that the lexer must +recognize.  A WordList class object contains methods that make simplify +the recognition of keywords.  Present lexers use a helper function +called classifyWordLLL to recognize keywords.  These functions show how +to use the keywordlists parameter to recognize keywords.  This +documentation will not discuss keywords further. + + +The lexer code + +The task of a lexer can be summarized briefly: for each range r of +characters that are to be colored the same, the lexer should call + +    styler.ColourTo(i, state) +         +where i is the position of the last character of the range r.  The lexer +should set the state variable to the coloring state of the character at +position i and continue until the entire text has been colored. + +Note 1:  the styler (Accessor) object remembers the i parameter in the +previous calls to styler.ColourTo, so the single i paramter suffices to +indicate a range of characters. + +Note 2: As a side effect of calling styler.ColourTo(i,state), the +coloring states of all characters in the range are remembered so that +Scintilla may set the initStyle parameter correctly on future calls to +the +lexer. + + +Lexer organization + +There are at least two ways to organize the code of each lexer.  Present +lexers use what might be called a "character-based" approach: the outer +loop iterates over characters, like this: + +  lengthDoc = startPos + length ; +  for (unsigned int i = startPos; i < lengthDoc; i++) { +    chNext = styler.SafeGetCharAt(i + 1); +    << handle special cases >> +    switch(state) { +      // Handlers examine only ch and chNext. +      // Handlers call styler.ColorTo(i,state) if the state changes. +      case state_1: << handle ch in state 1 >> +      case state_2: << handle ch in state 2 >> +      ... +      case state_n: << handle ch in state n >> +    } +    chPrev = ch; +  } +  styler.ColourTo(lengthDoc - 1, state); + + +An alternative would be to use a "state-based" approach.  The outer loop +would iterate over states, like this: + +  lengthDoc = startPos+lenth ; +  for ( unsigned int i = startPos ;; ) { +    char ch = styler.SafeGetCharAt(i); +    int new_state = 0 ; +    switch ( state ) { +      // scanners set new_state if they set the next state. +      case state_1: << scan to the end of state 1 >> break ; +      case state_2: << scan to the end of state 2 >> break ; +      case default_state: +        << scan to the next non-default state and set new_state >> +    } +    styler.ColourTo(i, state); +    if ( i >= lengthDoc ) break ; +    if ( ! new_state ) { +      ch = styler.SafeGetCharAt(i); +      << set state based on ch in the default state >> +    } +  } +  styler.ColourTo(lengthDoc - 1, state); + +This approach might seem to be more natural.  State scanners are simpler +than character scanners because less needs to be done.  For example, +there is no need to test for the start of a C string inside the scanner +for a C comment.  Also this way makes it natural to define routines that +could be used by more than one scanner; for example, a scanToEndOfLine +routine. + +However, the special cases handled in the main loop in the +character-based approach would have to be handled by each state scanner, +so both approaches have advantages.  These special cases are discussed +below. + +Special case: Lead characters + +Lead bytes are part of DBCS processing for languages such as Japanese +using an encoding such as Shift-JIS. In these encodings, extended +(16-bit) characters are encoded as a lead byte followed by a trail byte. + +Lead bytes are rarely of any lexical significance, normally only being +allowed within strings and comments. In such contexts, lexers should +ignore ch if styler.IsLeadByte(ch) returns TRUE. + +Note: UTF-8 is simpler than Shift-JIS, so no special handling is +applied for it. All UTF-8 extended characters are >= 128 and none are +lexically significant in programming languages which, so far, use only +characters in ASCII for operators, comment markers, etc. + + +Special case: Folding + +During initialization, lexers that support folding set + +    bool fold = styler.GetPropertyInt("fold"); +         +If folding is enabled in the editor, fold will be TRUE and the lexer +should call: + +    styler.SetLevel(line, level); +         +at the end of each line and just before exiting. + +The line parameter is simply the count of the number of newlines seen.  +It's initial value is styler.GetLine(startPos) and it is incremented +(after calling styler.SetLevel) whenever a newline is seen. + +The level parameter is the desired indentation level in the low 12 bits, +along with flag bits in the upper four bits. The indentation level +depends on the language.  For C++, it is incremented when the lexer sees +a '{' and decremented when the lexer sees a '}' (outside of strings and +comments, of course). + +The following flag bits, defined in Scintilla.h, may be set or cleared +in the flags parameter. The SC_FOLDLEVELWHITEFLAG flag is set if the +lexer considers that the line contains nothing but whitespace.  The +SC_FOLDLEVELHEADERFLAG flag indicates that the line is a fold point.  +This normally means that the next line has a greater level than present +line.  However, the lexer may have some other basis for determining a +fold point.  For example, a lexer might create a header line for the +first line of a function definition rather than the last. + +The SC_FOLDLEVELNUMBERMASK mask denotes the level number in the low 12 +bits of the level param. This mask may be used to isolate either flags +or level numbers. + +For example, the C++ lexer contains the following code when a newline is +seen: + +  if (fold) { +    int lev = levelPrev; + +    // Set the "all whitespace" bit if the line is blank. +    if (visChars == 0) +      lev |= SC_FOLDLEVELWHITEFLAG; + +    // Set the "header" bit if needed. +    if ((levelCurrent > levelPrev) && (visChars > 0)) +      lev |= SC_FOLDLEVELHEADERFLAG; +      styler.SetLevel(lineCurrent, lev); +         +    // reinitialize the folding vars describing the present line. +    lineCurrent++; +    visChars = 0;  // Number of non-whitespace characters on the line. +    levelPrev = levelCurrent; +  } + +The following code appears in the C++ lexer just before exit: + +  // Fill in the real level of the next line, keeping the current flags +  // as they will be filled in later. +  if (fold) { +    // Mask off the level number, leaving only the previous flags. +    int flagsNext = styler.LevelAt(lineCurrent); +    flagsNext &= ~SC_FOLDLEVELNUMBERMASK; +    styler.SetLevel(lineCurrent, levelPrev | flagsNext); +  } +         + +Don't worry about performance + +The writer of a lexer may safely ignore performance considerations: the +cost of redrawing the screen is several orders of magnitude greater than +the cost of function calls, etc.  Moreover, Scintilla performs all the +important optimizations; Scintilla ensures that a lexer will be called +only to recolor text that actually needs to be recolored.  Finally, it +is not necessary to avoid extra calls to styler.ColourTo: the sytler +object buffers calls to ColourTo to avoid multiple updates of the +screen. + +Page contributed by Edward K. Ream
\ No newline at end of file | 
