aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/lexer.c
AgeCommit message (Collapse)AuthorFilesLines
2025-07-18fixed minor memory leaks of per-state data in teco_machine_main_tRobin Haberkorn1-1/+3
* These were leaked e.g. in case of end-of-macro errors, but also in case of syntax highlighting (teco_lexer_style()). I considered to solve this by overwriting more of the end_of_macro_cb, but it didn't turn out to be trivial always. * Considering that the union in teco_machine_main_t saved only 3 machine words of memory, I decided to sacrifice those for more robust memory management. * teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t due to recursive dependencies with teco_machine_stringbuilding_t. It could now and should perhaps be allocated only once in teco_machine_main_init(), but it would require more refactoring.
2025-05-02implemented folding for the SciTECO lexerRobin Haberkorn1-1/+26
* This currently folds only {...} string arguments and embedded braces, most prominently `@^Um{...}` macro definitions.. * Any additional folding for loops and IF-statements should rely on book keeping by the main parser. This would also help catch syntactic errors, like dangling IFs. * We do keep track of the loop nesting, but currently only in execution mode. * It cannot be disabled via the "fold" property. Lexers in the container do not have properties.
2025-04-13fixed undoing bitfields on WindowsRobin Haberkorn1-4/+4
* It turns out that `bool` (_Bool) in bitfields may cause padding to the next 32-bit word. This was only observed on MinGW. I am not entirely sure why, although the C standard does not guarantee much with regard to bitfield memory layout and there are 64-bit available due to passing anyway. Actually, they could also be layed out in a different order. * I am now consistently using guint instead of `bool` in bitfields to prevent any potential surprises. * The way that guint was aliased with bitfield structs for undoing teco_machine_main_t and teco_machine_qregspec_t flags was therefore insecure. It was not guaranteed that the __flags field really "captures" all of the bit field. Even with `guint v : 1` fields, this was not guaranteed. We would have required a static assertion for robustness. Alternatively, we could have declared a `gsize __flags` variable as well. This __should__ be safe since gsize should always be pointer sized and correspond to the platform's alignment. However, it's also not 100% guaranteed. Using classic ANSI C enums with bit operations to encode multiple fields and flags into a single integer also doesn't look very attractive. * Instead, we now define scalar types with their own teco_undo_push() shortcuts for the bitfield structs. This is in one way simpler and much more robust, but on the other hand complicates access to the flag variables. * It's a good question whether a `struct __attribute__((packed))` bitfield with guint fields would be a reliable replacement for flag enums, that are communicated with the "outside" (TECO) world. I am not going to risk it until GCC gives any guarantees, though. For the time being, bitfields are only used internally where the concrete memory layout (bit positions) is not crucial. * This fixes the test suite and therefore probably CI and nightly builds on Windows. * Test case: Rub out `@I//` or `@Xq` until before the `@`. The parser doesn't know that `@` is still set and allows all sorts of commands where `@` should be forbidden. * It's unknown how long this has been broken on Windows - quite possibly since v2.0.
2025-03-23the ES command (send Scintilla message) now supports passing both wParam and ↵Robin Haberkorn1-3/+3
lParam as null-terminated strings * Being able to embed null bytes into the lParam string is practically useless - there aren't any messages where this is useful and where there are no native SciTECO counterparts - so this case is now catched and the null-byte separates wParam from lParam. * wParam can be the empty string, but it is not supported to pass wParam as a string and lParam as the empty string. If the second string argument ends in ^@, lParam is popped from the stack instead. * This is a temporary workaround until we can properly parse the Scintilla.iface and generate more elegant per-message wrappers. * It in particular unlocks the SCI_SETREPRESENTATION and SCI_SETPROPERTY messages. The former allows us to write a special hex-editor macro which sets hexadecimal character representations, while the latter allows you to set lexer properties. * The C-based lexers ("cpp" in Lexilla) can now take preprocessor definitions into account. This is disabled by default, unless you set lexer.c.defines before opening a file. You can also set it interactively and re-set the lexer. For instance: ^U[lexer.c.defines]NDEBUG$ M[lexer.set.c]
2025-01-13updated copyright to 2025Robin Haberkorn1-1/+1
2024-12-24introduced true block and EOL commentsRobin Haberkorn1-20/+31
* The previous convention of !* ... *! are now true block comments, i.e. they are parsed faster, don't spam the goto table and allow embedding of exclamation marks - only "*!" terminates the comment. * It is therefore now forbidden to have goto labels beginning with "*". * Also support "!!" to introduce EOL comments (like C++'s //). This disallows empty labels, but they weren't useful anyway. This is the shortest way to begin a comment. * All comment labels have been converted to true comments, to ensure that syntax highlighting works correctly. EOL comments are used for single line commented-out code, since it's easiest to uncomment - you don't have to jump to the line end. This is a pure convention / coding style. Other people might do it differently. * It's of course still possible to abuse goto labels as comments as TECO did for ages. * In lexing / syntax highlighting, labels and comments are highlighted differently. * When syntax highlighting, a single "!" will first be highlighted as a label since it's not yet unambiguous. Once you type the second character (* or !), the first character is retroactively styled as a comment as well.
2024-12-22fixed lexing (syntax highlighting) of the null-character (^@) in SciTECO codeRobin Haberkorn1-2/+6
* Apparently g_utf8_get_char_validated() sometimes(!) returns -2 for null-characters, so it was considered an invalid byte sequence. * What's strange and unexplainable is that other uses of the function, as are behind nA and nQq, did not cause problems and returned 0 for null-bytes. * This also fixes syntax higlighting of .teco_session files which use the null-byte as the string terminator. (.teco_session files are not highlighted automatically, though.)
2024-12-13implemented Scintilla lexer for SciTECO code, i.e. TECO syntax highlightingRobin Haberkorn1-0/+235
* this works by embedding the SciTECO parser and driving it always (exclusively) in parse-only mode. * A new teco_state_t::style determines the Scintilla style for any character accepted in the given state. * Therefore, the SciTECO lexer is always 100% exact and corresponds to the current SciTECO grammer - it does not have to be maintained separately. There are a few exceptions and tweaks, though. * The contents of curly-brace escapes (`@^Uq{...}`) are rendered as ordinary code using a separate parser instance. This can be disabled with the lexer.sciteco.macrodef property. Unfortunately, SciTECO does not currently allow setting lexer properties (FIXME). * Labels and comments are currently styled the same. This could change in the future once we introduce real comments. * Lexers are usually implemented in C++, but I did not want to draw in C++. Especially not since we'd have to include parser.h and other SciTECO headers, that really do not want to keep C++-compatible. Instead, the lexer is implemented "in the container". @ES/SCI_SETILEXER/sciteco/ is internally translated to SCI_SETILEXER(NULL) and we get Scintilla notifications when styling the view becomes necessary. This is then centrally forwarded to the teco_lexer_style() which uses the ordinary teco_view_ssm() API for styling. * Once the command line becomes a Scintilla view even on Curses, we can enabled syntax highlighting of the command line macro.