+ Scintilla Style Metadata +

SCI_GETSUBSTYLEBASES(<unused>, char *styles NUL-terminated) → int
Fill styles with a byte for each style that can be split into substyles.

Lexer Objects

Lexers are programmed as objects that implement the ILexer interface and that interact +

Lexers are programmed as objects that implement the ILexer4 interface and that interact with the document they are lexing through the IDocument interface. Previously lexers were defined by providing lexing and folding functions but creating an object to handle the interaction of a lexer with a document allows the lexer to store state information that @@ -6805,48 +6805,36 @@ sptr_t CallScintilla(unsigned int iMessage, uptr_t wParam, sptr_t lParam){ or variable declarations and style these depending on their role.

A set of helper classes allows older lexers defined by functions to be used in Scintilla.

ILexer

ILexer4

-class ILexer {
+class ILexer4 {
public:
-    virtual - int SCI_METHOD - Version() -const = - 0;
-    virtual - void SCI_METHOD - Release() -= 0;
-    virtual -const -char -* -SCI_METHOD PropertyNames() - = 0;
-    virtual - int SCI_METHOD PropertyType(const char *name) = 0;
-    virtual - const char * SCI_METHOD DescribeProperty(const char *name) = 0;
-    virtual - Sci_Position SCI_METHOD - PropertySet(const char *key, const char *val) = 0;
-    virtual - const char * SCI_METHOD - DescribeWordListSets() = 0;
-    virtual - Sci_Position SCI_METHOD - WordListSet(int n, const char *wl) = 0;
-    virtual - void SCI_METHOD - Lex(Sci_PositionU startPos, Sci_Position lengthDoc, int initStyle, IDocument *pAccess) = 0;
-    virtual - void SCI_METHOD - Fold(Sci_PositionU startPos, Sci_Position lengthDoc, int initStyle, IDocument *pAccess) = 0;
-    virtual - void * SCI_METHOD - PrivateCall(int operation, void *pointer) = 0;
+ virtual int SCI_METHOD Version() const = 0;
+ virtual void SCI_METHOD Release() = 0;
+ virtual const char * SCI_METHOD PropertyNames() = 0;
+ virtual int SCI_METHOD PropertyType(const char *name) = 0;
+ virtual const char * SCI_METHOD DescribeProperty(const char *name) = 0;
+ virtual Sci_Position SCI_METHOD PropertySet(const char *key, const char *val) = 0;
+ virtual const char * SCI_METHOD DescribeWordListSets() = 0;
+ virtual Sci_Position SCI_METHOD WordListSet(int n, const char *wl) = 0;
+ virtual void SCI_METHOD Lex(Sci_PositionU startPos, Sci_Position lengthDoc, int initStyle, IDocument *pAccess) = 0;
+ virtual void SCI_METHOD Fold(Sci_PositionU startPos, Sci_Position lengthDoc, int initStyle, IDocument *pAccess) = 0;
+ virtual void * SCI_METHOD PrivateCall(int operation, void *pointer) = 0;
+ virtual int SCI_METHOD LineEndTypesSupported() = 0;
+ virtual int SCI_METHOD AllocateSubStyles(int styleBase, int numberStyles) = 0;
+ virtual int SCI_METHOD SubStylesStart(int styleBase) = 0;
+ virtual int SCI_METHOD SubStylesLength(int styleBase) = 0;
+ virtual int SCI_METHOD StyleFromSubStyle(int subStyle) = 0;
+ virtual int SCI_METHOD PrimaryStyleFromStyle(int style) = 0;
+ virtual void SCI_METHOD FreeSubStyles() = 0;
+ virtual void SCI_METHOD SetIdentifiers(int style, const char *identifiers) = 0;
+ virtual int SCI_METHOD DistanceToSecondaryStyles() = 0;
+ virtual const char * SCI_METHOD GetSubStyleBases() = 0;
+ virtual int SCI_METHOD NamedStyles() = 0;
+ virtual const char * SCI_METHOD NameOfStyle(int style) = 0;
+ virtual const char * SCI_METHOD TagsOfStyle(int style) = 0;
+ virtual const char * SCI_METHOD DescriptionOfStyle(int style) = 0;
};

+Methods that return strings as const char * are not required to maintain separate allocations indefinitely: +lexer implementations may own a single buffer that is reused for each call. +Callers should make an immediate copy of returned strings. +

The return values from PropertySet and WordListSet are used to indicate whether the change requires performing lexing or folding over any of the document. It is the position at which to restart lexing and folding or -1 @@ -6864,7 +6858,7 @@ optimisation could be to remember where a setting first affects the document and

Version returns an enumerated value specifying which version of the interface is implemented: -lvOriginal for ILexer and lvSubStyles for ILexerWithSubStyles.

ILexerWithSubStyles

-To allow lexers to report which line ends they support, and to support substyles, -Ilexer is extended to ILexerWithSubStyles. +

NamedStyles, NameOfStyle, +TagsOfStyle, and DescriptionOfStyle +are used to provide information on the set of styles used by this lexer. +NameOfStyle is the C-language identifier like "SCE_LUA_COMMENT". +TagsOfStyle is a set of tags describing the style in a standardized way like "literal string multiline raw". +A set of common tags and conventions for combining them is described here. +DescriptionOfStyle is an English description of the style like "Function or method name definition".

-class ILexerWithSubStyles : public ILexer {
-public:
- virtual int SCI_METHOD LineEndTypesSupported() = 0;
- virtual int SCI_METHOD AllocateSubStyles(int styleBase, int numberStyles) = 0;
- virtual int SCI_METHOD SubStylesStart(int styleBase) = 0;
- virtual int SCI_METHOD SubStylesLength(int styleBase) = 0;
- virtual int SCI_METHOD StyleFromSubStyle(int subStyle) = 0;
- virtual int SCI_METHOD PrimaryStyleFromStyle(int style) = 0;
- virtual void SCI_METHOD FreeSubStyles() = 0;
- virtual void SCI_METHOD SetIdentifiers(int style, const char *identifiers) = 0;
- virtual int SCI_METHOD DistanceToSecondaryStyles() = 0;
- virtual const char * SCI_METHOD GetSubStyleBases() = 0;
-};
-

IDocument

@@ -6967,8 +6947,8 @@ The pWidth argument can be NULL if the caller doe bytes in the character.

The ILexer, ILexerWithSubStyles, and IDocument interfaces may be -expanded in the future with extended versions (ILexer2...). +

The ILexer4 and IDocument interfaces may be +expanded in the future with extended versions (ILexer5...). The Version method indicates which interface is implemented and thus which methods may be called.

diff --git a/doc/StyleMetadata.html b/doc/StyleMetadata.html new file mode 100644 index 000000000..79742be24 --- /dev/null +++ b/doc/StyleMetadata.html @@ -0,0 +1,204 @@ + + + + + + + + + + Scintilla Style Metadata + + + + + + + + + +

+ +	+ Scintilla +

+ Language Types +

+ Scintilla contains lexers for various types of languages: +

Programming languages like C++, Java, and Python.
Assembler languages are low-level programming languages which may additionally include instructions and registers.
Markup languages like HTML, TeX, and Markdown.
Data languages like EDIFACT and YAML.

+ Some languages can be used in different ways. JavaScript is a programming language but also + the basis of JSON data files. Similarly, + Lisp s expressions can be used for both source code and data. +

+ Each language type has common elements such as identifiers in programming languages. + These common elements should be identified so that languages can be displayed with common + styles for these elements. + Style tags are used for this purpose in Scintilla. +

+ Style Tags +

+ Every style has a list of tags where a tag is a lower-case word containing only the common ASCII letters 'a'-'z' + such as "comment" or "operator". +

+ Tags are ordered from most important to least important. +

+ While applications may assign visual attributes for tag lists in many different ways, one reasonable technique is to + apply tag-specific attributes in reverse order so that earlier and more important tags override less important tags. + For example, the tag list "error comment documentation keyword" with + a set of tag attributes
+ { comment=fore:green,back:very-light-green,font:Serif documentation=fore:light-green error=strikethrough keyword=bold }
+ could be rendered as
+ bold,fore:light-green,back:very-light-green,font:Serif,strikethrough. +

+ Alternative renderings could check for multi-tag combinations like + { comment.documentation=fore:light-green comment.line=dark-green comment=green }. +

+ Commonly, a tag list will contain an optional embedded language; optional statuses; a base type; and a set of type modifiers:
+ embedded-language? status* base-type modifiers* +

Embedded language

+ The embedded language may be a source (client | server) followed by a language name + (javascript | php | python | basic). + This may be extended in the future with other programming languages and style-definition languages like CSS. +

Status

+ The statuses may be (error | unused | predefined | inactive).
+ The error status is used for lexical statuses that indicate errors in the source code such as unterminated quoted strings.
+ The unused status may indicate a gap in the lexical states, possibly because an old lexical class is no longer used or an upcoming lexical class may fill that position.
+ The predefined status indicates a style in the range 32.39 that is used for non-lexical purposes in Scintilla.
+ The inactive status is used for text that is not currently interpreted such as C++ code that is contained within a '#if 0' preprocessor block. +

Basic Types

+ Assembler languages add (instruction | register). to the basic types from programming languages.
+

+ The basic types for markup languages are (default | tag | attribute | comment | preprocessor).
+

+ The basic types for data languages are (default | key | data | comment).
+

Comments

+ Programming languages may differentiate between line and stream comments and treat documentation comments as distinct from other comments. + Documentation comments may be marked up with documentation keywords.
+ The additional attributes commonly used are (line | documentation | keyword | taskmarker). +

Literals

+ An escape sequence within an interpolated heredoc may thus be literal string heredoc escapesequence. +

+ List of known tags +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

`attribute`	Markup attribute
`basic`	Embedded Basic
`boolean`	True or false literal
`character`	Single character literal as opposed to a string literal
`client`	Script executed on client
`comment`	The standard comment type in a language: may be stream or line
`compound`	Literal containing multiple subliterals such as a tuple or complex number
`data`	A value in a data file
`date`	Literal representing a data such as '19/November/1975'
`default`	Starting state commonly also used for white space
`documentation`	Comment that can be extracted into documentation
`error`	State indicating an invalid or erroneous element
`escapesequence`	Parts of a string that are not literal such as '\t' for tab in C
`heredoc`	Lengthy text literal marked by a word at both ends
`identifier`	Name that identifies an object or class of object
`inactive`	Code that is not currently interpreted
`instruction`	Mnemonic in assembler languages like 'addc'
`integer`	Numeric literal with no fraction or exponent like '738'
`interpolated`	String that can contain expressions
`javascript`	Embedded Javascript
`key`	Element which allows finding associated data
`keyword`	Reserved word with special meaning like 'while'
`label`	Destination for jumps in programming and assembler languages
`line`	Differentiates between stream comments and line comments in languages that have both
`literal`	Fixed value in source code
`multiline`	Differentiates between single line and multiline elements, commonly strings
`nil`	Literal for the null pointer such as nullptr in C++ or NULL in C
`numeric`	Literal number like '16'
`operator`	Punctuation character such as '&' or '['
`php`	Embedded PHP
`predefined`	Style in the range 32.39 that is used for non-lexical purposes
`preprocessor`	Element that is recognized in an early stage of translation
`python`	Embedded Python
`raw`	String type that avoids interpretation: may be used for regular expressions in languages without a specific regex type
`real`	Numeric literal which may have a fraction or exponent like '3.84e-15'
`regex`	Regular expression literal like '^[a-z]+'
`register`	CPU register in assembler languages
`server`	Script executed on server
`string`	Sequence of characters
`tag`	Markup tag like '<br />'
`taskmarker`	Word in comment that marks future work like 'FIXME'
`time`	Literal representing a time such as '9:34:31'
`unused`	Style that is not currently used
`uuid`	Universally unique identifier often used in interface definition files which may look like '{098f2470-bae0-11cd-b579-08002b30bfeb}'

+ Extension +

+ Each element in this scheme may be extended in the future. This may be done by revising this document to provide a common approach to new features. + Individual lexers may also choose to expose unique language features through new tags. +

+ Translation +

+ Tags could be exposed directly in user interfaces or configuration languages. + However, an application may also translate these to match its naming schema. + Capitalization and punctuation could be different (like Here-Doc instead of heredoc), + terminology changed ("constant" instead of "literal"), + or human language changed from English to Chinese or Spanish. +

+ Starting from a common set of tags makes these modifications tractable. +

+ Open issues +

+ The C++ lexer (for example) has inactive states and dynamically allocated substyles. + These should be exposed through the metadata mechanism but are not currently. +

+ + -- cgit v1.2.3