sciteco/lib/lexers, branch v2.3.0

introduced true block and EOL comments

2024-12-24T10:29:32+00:00

* The previous convention of !* ... *! are now true block comments,
  i.e. they are parsed faster, don't spam the goto table and allow
  embedding of exclamation marks - only "*!" terminates the comment.
* It is therefore now forbidden to have goto labels beginning with "*".
* Also support "!!" to introduce EOL comments (like C++'s //).
  This disallows empty labels, but they weren't useful anyway.
  This is the shortest way to begin a comment.
* All comment labels have been converted to true comments, to ensure
  that syntax highlighting works correctly.
  EOL comments are used for single line commented-out code, since it's
  easiest to uncomment - you don't have to jump to the line end.
  This is a pure convention / coding style.
  Other people might do it differently.
* It's of course still possible to abuse goto labels as comments
  as TECO did for ages.
* In lexing / syntax highlighting, labels and comments are highlighted differently.
* When syntax highlighting, a single "!" will first be highlighted as a label
  since it's not yet unambiguous. Once you type the second character (* or !),
  the first character is retroactively styled as a comment as well.

update sciteco.tes: this again highlights commands, but not Q-Register names

2024-12-13T21:23:47+00:00

It's a bit easier on the eyes.

fixup 244a54a18b7db6af177c9d10f3224772f08d7484: abuse the Scintilla view's "identifier" to enable lexing in the container

2024-12-13T12:28:04+00:00

* SCI_SETILEXER(NULL) is not a reliable way to do that since
  that's the default for all views.
* This was breaking the git.tes lexer for instance and was unnecessarily
  driving teco_lexer_style() on plain-text documents.
* Since we currently do not implement the ILexer5 C++ interface
  and teco_view_t is just a pointer alias, we are abusing the view's "identifier" instead.
  This is probably sufficient, as long as there is only one lexer "in the container".
  Otherwise, there should perhaps be a single C++ class that does nothing but
  wrapping a callback into an ILexer5 object with a C ABI.

implemented Scintilla lexer for SciTECO code, i.e. TECO syntax highlighting

2024-12-12T21:58:14+00:00

* this works by embedding the SciTECO parser and driving it always (exclusively)
  in parse-only mode.
* A new teco_state_t::style determines the Scintilla style for any character
  accepted in the given state.
* Therefore, the SciTECO lexer is always 100% exact and corresponds to the current
  SciTECO grammer - it does not have to be maintained separately.
  There are a few exceptions and tweaks, though.
* The contents of curly-brace escapes (`@^Uq{...}`) are rendered as ordinary
  code using a separate parser instance.
  This can be disabled with the lexer.sciteco.macrodef property.
  Unfortunately, SciTECO does not currently allow setting lexer properties (FIXME).
* Labels and comments are currently styled the same.
  This could change in the future once we introduce real comments.
* Lexers are usually implemented in C++, but I did not want to draw in C++.
  Especially not since we'd have to include parser.h and other SciTECO headers,
  that really do not want to keep C++-compatible.
  Instead, the lexer is implemented "in the container".
  @ES/SCI_SETILEXER/sciteco/ is internally translated to SCI_SETILEXER(NULL)
  and we get Scintilla notifications when styling the view becomes necessary.
  This is then centrally forwarded to the teco_lexer_style() which
  uses the ordinary teco_view_ssm() API for styling.
* Once the command line becomes a Scintilla view even on Curses,
  we can enabled syntax highlighting of the command line macro.

support the ::S anchored search (string comparison) command (and ::FD, ::FR, ::FS as well)

2024-12-06T14:20:52+00:00

* The colon modifier can now occur 2 times.
  Specifying `@` more than once or `:` more than twice is an error now.
* Commands do not check for excess colon modifiers - almost every command would have
  to check it. Instead, a double colon will simply behave like a single colon on most
  commands.
* All search commands inherit the anchored semantics, but it's not very useful in some combinations
  like -::S, ::N or ::FK.
  That's why the `::` variants are not documented everywhere.
* The lexer.checkheader macro could be simplified and should also be faster now,
  speeding up startup.
  Eventually this macro can be made superfluous, e.g. by using 1:FB or 0,1^Q::S.

added special Q-Register ":" for accessing dot

2024-11-24T01:51:34+00:00

* We cannot call it "." since that introduces a local register
  and we don't want to add an unnecessary syntactic exception.
* Allows the idiom [: ... ]: to temporarily move around.
  Also, you can now write ^E\: without having to store dot in a register first.
* In the future we might add an ^E register as well for byte offsets.
  However, there are much fewer useful applications.
* Of course, you can now also write nU: instead of nJ, Q: instead of "." and
  n%: instead of "nC.". However it's all not really useful.

Git lexer: added support for TAG_EDITMSG and MERGE_MSG

2024-09-26T13:12:49+00:00

* Curses: "icons" have also been added

improved HTML lexer (html.tes)

2024-09-17T02:46:12+00:00

This previously highlighted little more than embedded Javascripts.

added an improvised lexer for styling Git commit messages

2024-09-09T17:24:18+00:00

It's not a real Lexilla lexer, but simply styles the document
once in lexer.set.git in order to highlight comment lines.

added troff/nroff lexer

2024-08-18T16:48:06+00:00

* This is optimized for Groff, but works for Heirloom Troff and Neatroff as well.
  Currently, the Heirloom and Neatroff requests are just added ontop of the Groff
  ones. Theoretically, we could also try to separate the keyword lists into
  a base K&R set with Groff, Heirloom and Neatroff ontop.
* The lexer necessarily has many restrictions, as Troff is fundamentally unparseable
  (like classic TECO) and needs a lot of per-request knowledge.
* The "*.mm" extension has been removed from the lexers/cpp.tes.
  I don't know what language this was for, and I prefer `*.mm` files
  to be considered Troff.
* Temporarily changed the lexilla submodule URL.
  The corresponding Lexila lexer is in the process of being upstreamed.
  Once it is, I will probably revert the submodule to the official repository,
  as the "troff" branch is not stable (can be rebased).