+ Scintilla Style Metadata +

From 8fe23071ec3f82bf2b602f2ba5edee0cf6bc6fa3 Mon Sep 17 00:00:00 2001 From: Neil Date: Mon, 17 Jul 2017 14:21:40 +1000 Subject: Documentation for style metadata. --- doc/StyleMetadata.html | 204 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 doc/StyleMetadata.html (limited to 'doc/StyleMetadata.html') diff --git a/doc/StyleMetadata.html b/doc/StyleMetadata.html new file mode 100644 index 000000000..79742be24 --- /dev/null +++ b/doc/StyleMetadata.html @@ -0,0 +1,204 @@ + + + + + + + + + + Scintilla Style Metadata + + + + + + + + + +

+ +	+ Scintilla +

+ Language Types +

+ Scintilla contains lexers for various types of languages: +

Programming languages like C++, Java, and Python.
Assembler languages are low-level programming languages which may additionally include instructions and registers.
Markup languages like HTML, TeX, and Markdown.
Data languages like EDIFACT and YAML.

+ Some languages can be used in different ways. JavaScript is a programming language but also + the basis of JSON data files. Similarly, + Lisp s expressions can be used for both source code and data. +

+ Each language type has common elements such as identifiers in programming languages. + These common elements should be identified so that languages can be displayed with common + styles for these elements. + Style tags are used for this purpose in Scintilla. +

+ Style Tags +

+ Every style has a list of tags where a tag is a lower-case word containing only the common ASCII letters 'a'-'z' + such as "comment" or "operator". +

+ Tags are ordered from most important to least important. +

+ While applications may assign visual attributes for tag lists in many different ways, one reasonable technique is to + apply tag-specific attributes in reverse order so that earlier and more important tags override less important tags. + For example, the tag list "error comment documentation keyword" with + a set of tag attributes
+ { comment=fore:green,back:very-light-green,font:Serif documentation=fore:light-green error=strikethrough keyword=bold }
+ could be rendered as
+ bold,fore:light-green,back:very-light-green,font:Serif,strikethrough. +

+ Alternative renderings could check for multi-tag combinations like + { comment.documentation=fore:light-green comment.line=dark-green comment=green }. +

+ Commonly, a tag list will contain an optional embedded language; optional statuses; a base type; and a set of type modifiers:
+ embedded-language? status* base-type modifiers* +

Embedded language

+ The embedded language may be a source (client | server) followed by a language name + (javascript | php | python | basic). + This may be extended in the future with other programming languages and style-definition languages like CSS. +

Status

+ The statuses may be (error | unused | predefined | inactive).
+ The error status is used for lexical statuses that indicate errors in the source code such as unterminated quoted strings.
+ The unused status may indicate a gap in the lexical states, possibly because an old lexical class is no longer used or an upcoming lexical class may fill that position.
+ The predefined status indicates a style in the range 32.39 that is used for non-lexical purposes in Scintilla.
+ The inactive status is used for text that is not currently interpreted such as C++ code that is contained within a '#if 0' preprocessor block. +

Basic Types

+ Assembler languages add (instruction | register). to the basic types from programming languages.
+

+ The basic types for markup languages are (default | tag | attribute | comment | preprocessor).
+

+ The basic types for data languages are (default | key | data | comment).
+

Comments

+ Programming languages may differentiate between line and stream comments and treat documentation comments as distinct from other comments. + Documentation comments may be marked up with documentation keywords.
+ The additional attributes commonly used are (line | documentation | keyword | taskmarker). +

Literals

+ An escape sequence within an interpolated heredoc may thus be literal string heredoc escapesequence. +

+ List of known tags +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

`attribute`	Markup attribute
`basic`	Embedded Basic
`boolean`	True or false literal
`character`	Single character literal as opposed to a string literal
`client`	Script executed on client
`comment`	The standard comment type in a language: may be stream or line
`compound`	Literal containing multiple subliterals such as a tuple or complex number
`data`	A value in a data file
`date`	Literal representing a data such as '19/November/1975'
`default`	Starting state commonly also used for white space
`documentation`	Comment that can be extracted into documentation
`error`	State indicating an invalid or erroneous element
`escapesequence`	Parts of a string that are not literal such as '\t' for tab in C
`heredoc`	Lengthy text literal marked by a word at both ends
`identifier`	Name that identifies an object or class of object
`inactive`	Code that is not currently interpreted
`instruction`	Mnemonic in assembler languages like 'addc'
`integer`	Numeric literal with no fraction or exponent like '738'
`interpolated`	String that can contain expressions
`javascript`	Embedded Javascript
`key`	Element which allows finding associated data
`keyword`	Reserved word with special meaning like 'while'
`label`	Destination for jumps in programming and assembler languages
`line`	Differentiates between stream comments and line comments in languages that have both
`literal`	Fixed value in source code
`multiline`	Differentiates between single line and multiline elements, commonly strings
`nil`	Literal for the null pointer such as nullptr in C++ or NULL in C
`numeric`	Literal number like '16'
`operator`	Punctuation character such as '&' or '['
`php`	Embedded PHP
`predefined`	Style in the range 32.39 that is used for non-lexical purposes
`preprocessor`	Element that is recognized in an early stage of translation
`python`	Embedded Python
`raw`	String type that avoids interpretation: may be used for regular expressions in languages without a specific regex type
`real`	Numeric literal which may have a fraction or exponent like '3.84e-15'
`regex`	Regular expression literal like '^[a-z]+'
`register`	CPU register in assembler languages
`server`	Script executed on server
`string`	Sequence of characters
`tag`	Markup tag like '<br />'
`taskmarker`	Word in comment that marks future work like 'FIXME'
`time`	Literal representing a time such as '9:34:31'
`unused`	Style that is not currently used
`uuid`	Universally unique identifier often used in interface definition files which may look like '{098f2470-bae0-11cd-b579-08002b30bfeb}'

+ Extension +

+ Each element in this scheme may be extended in the future. This may be done by revising this document to provide a common approach to new features. + Individual lexers may also choose to expose unique language features through new tags. +

+ Translation +

+ Tags could be exposed directly in user interfaces or configuration languages. + However, an application may also translate these to match its naming schema. + Capitalization and punctuation could be different (like Here-Doc instead of heredoc), + terminology changed ("constant" instead of "literal"), + or human language changed from English to Chinese or Spanish. +

+ Starting from a common set of tags makes these modifications tractable. +

+ Open issues +

+ The C++ lexer (for example) has inactive states and dynamically allocated substyles. + These should be exposed through the metadata mechanism but are not currently. +

+ + -- cgit v1.2.3