sciteco/src/lexer.c, branch v2.5.2

updated copyright to 2026

2026-01-01T06:59:49+00:00

SciTECO lexer: braces and two-character operators are now actually styled as operators

2025-12-11T14:13:09+00:00

* For consistency with other lexers, like the C/C++ lexer.
* `^*`, `^/` and `^#` are also true operators and shouldn't be styled
  as regular commands.
  This required introducing a new operator style (3).

SciTECO lexer: style comma as operator

2025-11-18T23:03:07+00:00

the SciTECO lexer now tries to avoid unnecessary restylings by styling from the current line as well

2025-11-09T12:05:34+00:00

* This optimization is unnecessary for regular TECO scripts unless you write stupendously long lines.
  On the command line macro however, we were always restyling the entire command line with every insertion
  or rubout since the command line view uses single line mode by default.
  Even if you enable a multi-line command line with regular line breaks, it's unlikely that you would insert
  many line breaks except when inserting text into the buffer.
* We will now during insertion into the command line view style from the beginning of the last regular
  command.
* During rub out from the command line, we still won't have enough information about where the previous
  valid start state was, so we will frequently have to restyle the entire command line.
  This might be worked around by adding a cmdline-view-specific hack if it turns out to be relevant
  on very long command lines and slow computers.

fixed minor memory leaks of per-state data in teco_machine_main_t

2025-07-17T21:34:56+00:00

* These were leaked e.g. in case of end-of-macro errors,
  but also in case of syntax highlighting (teco_lexer_style()).
  I considered to solve this by overwriting more of the end_of_macro_cb,
  but it didn't turn out to be trivial always.
* Considering that the union in teco_machine_main_t saved only 3 machine words
  of memory, I decided to sacrifice those for more robust memory management.
* teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t
  due to recursive dependencies with teco_machine_stringbuilding_t.
  It could now and should perhaps be allocated only once in teco_machine_main_init(),
  but it would require more refactoring.

implemented folding for the SciTECO lexer

2025-05-02T08:59:31+00:00

* This currently folds only {...} string arguments and embedded braces,
  most prominently `@^Um{...}` macro definitions..
* Any additional folding for loops and IF-statements should rely on book
  keeping by the main parser.
  This would also help catch syntactic errors, like dangling IFs.
  * We do keep track of the loop nesting, but currently only in execution mode.
* It cannot be disabled via the "fold" property.
  Lexers in the container do not have properties.

fixed undoing bitfields on Windows

2025-04-12T22:33:43+00:00

* It turns out that `bool` (_Bool) in bitfields may cause
  padding to the next 32-bit word.
  This was only observed on MinGW.
  I am not entirely sure why, although the C standard does
  not guarantee much with regard to bitfield memory layout
  and there are 64-bit available due to passing anyway.
  Actually, they could also be layed out in a different order.
* I am now consistently using guint instead of `bool` in bitfields
  to prevent any potential surprises.
* The way that guint was aliased with bitfield structs
  for undoing teco_machine_main_t and teco_machine_qregspec_t flags
  was therefore insecure.
  It was not guaranteed that the __flags field really "captures"
  all of the bit field.
  Even with `guint v : 1` fields, this was not guaranteed.
  We would have required a static assertion for robustness.
  Alternatively, we could have declared a `gsize __flags` variable
  as well. This __should__ be safe since gsize should always be
  pointer sized and correspond to the platform's alignment.
  However, it's also not 100% guaranteed.
  Using classic ANSI C enums with bit operations to encode multiple
  fields and flags into a single integer also doesn't look very
  attractive.
* Instead, we now define scalar types with their own teco_undo_push()
  shortcuts for the bitfield structs.
  This is in one way simpler and much more robust, but on the other
  hand complicates access to the flag variables.
* It's a good question whether a `struct __attribute__((packed))` bitfield
  with guint fields would be a reliable replacement for flag enums, that
  are communicated with the "outside" (TECO) world.
  I am not going to risk it until GCC gives any guarantees, though.
  For the time being, bitfields are only used internally where
  the concrete memory layout (bit positions) is not crucial.
* This fixes the test suite and therefore probably CI and nightly
  builds on Windows.
* Test case: Rub out `@I//` or `@Xq` until before the `@`.
  The parser doesn't know that `@` is still set and allows
  all sorts of commands where `@` should be forbidden.
* It's unknown how long this has been broken on Windows - quite
  possibly since v2.0.

the ES command (send Scintilla message) now supports passing both wParam and lParam as null-terminated strings

2025-03-23T15:42:07+00:00

* Being able to embed null bytes into the lParam string is
  practically useless - there aren't any messages where this is useful
  and where there are no native SciTECO counterparts - so this case is now catched
  and the null-byte separates wParam from lParam.
* wParam can be the empty string, but it is not supported to pass wParam as a
  string and lParam as the empty string.
  If the second string argument ends in ^@, lParam is popped from the stack instead.
* This is a temporary workaround until we can properly parse the Scintilla.iface and
  generate more elegant per-message wrappers.
* It in particular unlocks the SCI_SETREPRESENTATION and SCI_SETPROPERTY messages.
  The former allows us to write a special hex-editor macro which sets hexadecimal
  character representations, while the latter allows you to set lexer properties.
* The C-based lexers ("cpp" in Lexilla) can now take preprocessor definitions into account.
  This is disabled by default, unless you set lexer.c.defines before opening a file.
  You can also set it interactively and re-set the lexer. For instance:
  ^U[lexer.c.defines]NDEBUG$ M[lexer.set.c]

2025-01-12T23:39:34+00:00

introduced true block and EOL comments

2024-12-24T10:29:32+00:00

* The previous convention of !* ... *! are now true block comments,
  i.e. they are parsed faster, don't spam the goto table and allow
  embedding of exclamation marks - only "*!" terminates the comment.
* It is therefore now forbidden to have goto labels beginning with "*".
* Also support "!!" to introduce EOL comments (like C++'s //).
  This disallows empty labels, but they weren't useful anyway.
  This is the shortest way to begin a comment.
* All comment labels have been converted to true comments, to ensure
  that syntax highlighting works correctly.
  EOL comments are used for single line commented-out code, since it's
  easiest to uncomment - you don't have to jump to the line end.
  This is a pure convention / coding style.
  Other people might do it differently.
* It's of course still possible to abuse goto labels as comments
  as TECO did for ages.
* In lexing / syntax highlighting, labels and comments are highlighted differently.
* When syntax highlighting, a single "!" will first be highlighted as a label
  since it's not yet unambiguous. Once you type the second character (* or !),
  the first character is retroactively styled as a comment as well.