aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/lexer.c
AgeCommit message (Collapse)AuthorFilesLines
32 hoursupdated copyright to 2026Robin Haberkorn1-1/+1
2025-12-11SciTECO lexer: braces and two-character operators are now actually styled as ↵Robin Haberkorn1-5/+12
operators * For consistency with other lexers, like the C/C++ lexer. * `^*`, `^/` and `^#` are also true operators and shouldn't be styled as regular commands. This required introducing a new operator style (3).
2025-11-19SciTECO lexer: style comma as operatorRobin Haberkorn1-1/+1
2025-11-09the SciTECO lexer now tries to avoid unnecessary restylings by styling from ↵Robin Haberkorn1-5/+18
the current line as well * This optimization is unnecessary for regular TECO scripts unless you write stupendously long lines. On the command line macro however, we were always restyling the entire command line with every insertion or rubout since the command line view uses single line mode by default. Even if you enable a multi-line command line with regular line breaks, it's unlikely that you would insert many line breaks except when inserting text into the buffer. * We will now during insertion into the command line view style from the beginning of the last regular command. * During rub out from the command line, we still won't have enough information about where the previous valid start state was, so we will frequently have to restyle the entire command line. This might be worked around by adding a cmdline-view-specific hack if it turns out to be relevant on very long command lines and slow computers.
2025-07-18fixed minor memory leaks of per-state data in teco_machine_main_tRobin Haberkorn1-1/+3
* These were leaked e.g. in case of end-of-macro errors, but also in case of syntax highlighting (teco_lexer_style()). I considered to solve this by overwriting more of the end_of_macro_cb, but it didn't turn out to be trivial always. * Considering that the union in teco_machine_main_t saved only 3 machine words of memory, I decided to sacrifice those for more robust memory management. * teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t due to recursive dependencies with teco_machine_stringbuilding_t. It could now and should perhaps be allocated only once in teco_machine_main_init(), but it would require more refactoring.
2025-05-02implemented folding for the SciTECO lexerRobin Haberkorn1-1/+26
* This currently folds only {...} string arguments and embedded braces, most prominently `@^Um{...}` macro definitions.. * Any additional folding for loops and IF-statements should rely on book keeping by the main parser. This would also help catch syntactic errors, like dangling IFs. * We do keep track of the loop nesting, but currently only in execution mode. * It cannot be disabled via the "fold" property. Lexers in the container do not have properties.
2025-04-13fixed undoing bitfields on WindowsRobin Haberkorn1-4/+4
* It turns out that `bool` (_Bool) in bitfields may cause padding to the next 32-bit word. This was only observed on MinGW. I am not entirely sure why, although the C standard does not guarantee much with regard to bitfield memory layout and there are 64-bit available due to passing anyway. Actually, they could also be layed out in a different order. * I am now consistently using guint instead of `bool` in bitfields to prevent any potential surprises. * The way that guint was aliased with bitfield structs for undoing teco_machine_main_t and teco_machine_qregspec_t flags was therefore insecure. It was not guaranteed that the __flags field really "captures" all of the bit field. Even with `guint v : 1` fields, this was not guaranteed. We would have required a static assertion for robustness. Alternatively, we could have declared a `gsize __flags` variable as well. This __should__ be safe since gsize should always be pointer sized and correspond to the platform's alignment. However, it's also not 100% guaranteed. Using classic ANSI C enums with bit operations to encode multiple fields and flags into a single integer also doesn't look very attractive. * Instead, we now define scalar types with their own teco_undo_push() shortcuts for the bitfield structs. This is in one way simpler and much more robust, but on the other hand complicates access to the flag variables. * It's a good question whether a `struct __attribute__((packed))` bitfield with guint fields would be a reliable replacement for flag enums, that are communicated with the "outside" (TECO) world. I am not going to risk it until GCC gives any guarantees, though. For the time being, bitfields are only used internally where the concrete memory layout (bit positions) is not crucial. * This fixes the test suite and therefore probably CI and nightly builds on Windows. * Test case: Rub out `@I//` or `@Xq` until before the `@`. The parser doesn't know that `@` is still set and allows all sorts of commands where `@` should be forbidden. * It's unknown how long this has been broken on Windows - quite possibly since v2.0.
2025-03-23the ES command (send Scintilla message) now supports passing both wParam and ↵Robin Haberkorn1-3/+3
lParam as null-terminated strings * Being able to embed null bytes into the lParam string is practically useless - there aren't any messages where this is useful and where there are no native SciTECO counterparts - so this case is now catched and the null-byte separates wParam from lParam. * wParam can be the empty string, but it is not supported to pass wParam as a string and lParam as the empty string. If the second string argument ends in ^@, lParam is popped from the stack instead. * This is a temporary workaround until we can properly parse the Scintilla.iface and generate more elegant per-message wrappers. * It in particular unlocks the SCI_SETREPRESENTATION and SCI_SETPROPERTY messages. The former allows us to write a special hex-editor macro which sets hexadecimal character representations, while the latter allows you to set lexer properties. * The C-based lexers ("cpp" in Lexilla) can now take preprocessor definitions into account. This is disabled by default, unless you set lexer.c.defines before opening a file. You can also set it interactively and re-set the lexer. For instance: ^U[lexer.c.defines]NDEBUG$ M[lexer.set.c]
2025-01-13updated copyright to 2025Robin Haberkorn1-1/+1
2024-12-24introduced true block and EOL commentsRobin Haberkorn1-20/+31
* The previous convention of !* ... *! are now true block comments, i.e. they are parsed faster, don't spam the goto table and allow embedding of exclamation marks - only "*!" terminates the comment. * It is therefore now forbidden to have goto labels beginning with "*". * Also support "!!" to introduce EOL comments (like C++'s //). This disallows empty labels, but they weren't useful anyway. This is the shortest way to begin a comment. * All comment labels have been converted to true comments, to ensure that syntax highlighting works correctly. EOL comments are used for single line commented-out code, since it's easiest to uncomment - you don't have to jump to the line end. This is a pure convention / coding style. Other people might do it differently. * It's of course still possible to abuse goto labels as comments as TECO did for ages. * In lexing / syntax highlighting, labels and comments are highlighted differently. * When syntax highlighting, a single "!" will first be highlighted as a label since it's not yet unambiguous. Once you type the second character (* or !), the first character is retroactively styled as a comment as well.
2024-12-22fixed lexing (syntax highlighting) of the null-character (^@) in SciTECO codeRobin Haberkorn1-2/+6
* Apparently g_utf8_get_char_validated() sometimes(!) returns -2 for null-characters, so it was considered an invalid byte sequence. * What's strange and unexplainable is that other uses of the function, as are behind nA and nQq, did not cause problems and returned 0 for null-bytes. * This also fixes syntax higlighting of .teco_session files which use the null-byte as the string terminator. (.teco_session files are not highlighted automatically, though.)
2024-12-13implemented Scintilla lexer for SciTECO code, i.e. TECO syntax highlightingRobin Haberkorn1-0/+235
* this works by embedding the SciTECO parser and driving it always (exclusively) in parse-only mode. * A new teco_state_t::style determines the Scintilla style for any character accepted in the given state. * Therefore, the SciTECO lexer is always 100% exact and corresponds to the current SciTECO grammer - it does not have to be maintained separately. There are a few exceptions and tweaks, though. * The contents of curly-brace escapes (`@^Uq{...}`) are rendered as ordinary code using a separate parser instance. This can be disabled with the lexer.sciteco.macrodef property. Unfortunately, SciTECO does not currently allow setting lexer properties (FIXME). * Labels and comments are currently styled the same. This could change in the future once we introduce real comments. * Lexers are usually implemented in C++, but I did not want to draw in C++. Especially not since we'd have to include parser.h and other SciTECO headers, that really do not want to keep C++-compatible. Instead, the lexer is implemented "in the container". @ES/SCI_SETILEXER/sciteco/ is internally translated to SCI_SETILEXER(NULL) and we get Scintilla notifications when styling the view becomes necessary. This is then centrally forwarded to the teco_lexer_style() which uses the ordinary teco_view_ssm() API for styling. * Once the command line becomes a Scintilla view even on Curses, we can enabled syntax highlighting of the command line macro.