aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/parser.c
AgeCommit message (Collapse)AuthorFilesLines
2025-08-02fixed serious bug with certain alternative string termination chars in ↵Robin Haberkorn1-21/+28
commands with multiple string arguments * When `@`-modifying a command with several string arguments and choosing `{` as the alternative string termination character, the parser would get totally confused. Any sequence of `{` would be ignored and only the first non-`{` would become the termination character. Consequently you also couldn't choose a new terminator after the closing `}`. So even a documented code example from sciteco(7) wouldn't work. The same was true when using $ (escape) or ^A as the alternative termination character. * We can now correctly parse e.g. `@FR{foo}{bar}` or `@FR$foo$bar$` (even though the latter one is quite pointless). * has probably been broken forever (has been broken even before v2.0). * Whitespace is now ignored in front of alternative termination characters as in TECO-64, so we can also write `@S /foo/` or even ``` @^Um { !* blabla *! } ``` I wanted to disallow whitespace termination characters, so the alternative would have been to throw an error. The new implementation at least adds some functionality. * Avoid redundancies when parsing no-op characters via teco_is_noop(). I assume that this is inlined and drawn into any jump-table what would be generated for the switch-statement in teco_state_start_input(). * Alternative termination characters are still case-folded, even if they are Unicode glyphs, so `@IЖfooж` would work and insert `foo`. This should perhaps be restricted to ANSI characters?
2025-07-26implemented the <^A> command for printing arbitrary stringsRobin Haberkorn1-0/+4
* Greatly improved usability as a scripting language. * The command is in DEC TECO, but in contrast to DEC TECO, we also support string building constructs in ^A. * Required some refactoring: As we want it to write everything verbatim to stdout, the per-interface method is now teco_interface_msg_literal() and it has to deal with unprintable characters. When displaying in the UI, we use teco_curses_format_str() and TecoGtkLabel functions/widgets to deal with possible control codes. * Numbers printed with `=` have to be written with a trailing linefeed, which would also be visible as a reverse "LF" in the UI. Not sure whether this is acceptable - the alternative would be to strip the strings before displaying them. * Messages written to stdout are also auto-flushed at the moment. In the future we might want to put flushing under control of the language. Perhaps :^A could inhibit the flushing.
2025-07-18fixed minor memory leaks of per-state data in teco_machine_main_tRobin Haberkorn1-4/+4
* These were leaked e.g. in case of end-of-macro errors, but also in case of syntax highlighting (teco_lexer_style()). I considered to solve this by overwriting more of the end_of_macro_cb, but it didn't turn out to be trivial always. * Considering that the union in teco_machine_main_t saved only 3 machine words of memory, I decided to sacrifice those for more robust memory management. * teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t due to recursive dependencies with teco_machine_stringbuilding_t. It could now and should perhaps be allocated only once in teco_machine_main_init(), but it would require more refactoring.
2025-07-13minor documentation fix in parser.cRobin Haberkorn1-1/+1
2025-07-03implemented ^E<code> string building constructs for embedding bytes and ↵Robin Haberkorn1-30/+103
codepoints in a strtoul()-like manner
2025-05-24new string building construct ^P disables all further string building magicRobin Haberkorn1-22/+23
* Now, `I^P` can replace `EI`. EI is therefore now free to be repurposed as the new "mung file" command for improved TECO-11 compatibility. * On the downside when inserting large blocks of TECO code, you will have to write something like `@I{^P !...! }` * The construct is also useful when searching for carets as in `S^P^Q^`.
2025-04-13fixed undoing bitfields on WindowsRobin Haberkorn1-16/+18
* It turns out that `bool` (_Bool) in bitfields may cause padding to the next 32-bit word. This was only observed on MinGW. I am not entirely sure why, although the C standard does not guarantee much with regard to bitfield memory layout and there are 64-bit available due to passing anyway. Actually, they could also be layed out in a different order. * I am now consistently using guint instead of `bool` in bitfields to prevent any potential surprises. * The way that guint was aliased with bitfield structs for undoing teco_machine_main_t and teco_machine_qregspec_t flags was therefore insecure. It was not guaranteed that the __flags field really "captures" all of the bit field. Even with `guint v : 1` fields, this was not guaranteed. We would have required a static assertion for robustness. Alternatively, we could have declared a `gsize __flags` variable as well. This __should__ be safe since gsize should always be pointer sized and correspond to the platform's alignment. However, it's also not 100% guaranteed. Using classic ANSI C enums with bit operations to encode multiple fields and flags into a single integer also doesn't look very attractive. * Instead, we now define scalar types with their own teco_undo_push() shortcuts for the bitfield structs. This is in one way simpler and much more robust, but on the other hand complicates access to the flag variables. * It's a good question whether a `struct __attribute__((packed))` bitfield with guint fields would be a reliable replacement for flag enums, that are communicated with the "outside" (TECO) world. I am not going to risk it until GCC gives any guarantees, though. For the time being, bitfields are only used internally where the concrete memory layout (bit positions) is not crucial. * This fixes the test suite and therefore probably CI and nightly builds on Windows. * Test case: Rub out `@I//` or `@Xq` until before the `@`. The parser doesn't know that `@` is still set and allows all sorts of commands where `@` should be forbidden. * It's unknown how long this has been broken on Windows - quite possibly since v2.0.
2025-04-09tightened rules for specifying modifiersRobin Haberkorn1-0/+7
* Instead of separate stand-alone commands, they are now allowed only immediately in front of the commands that accept them. * The order is still insignificant if both `@` and `:` are accepted. * The number of colon modifiers is now also checked. We basically get this for free. * `@` has syntactic significance, so it could not be set conditionally anyway. Still, it was possible to provoke bugs were `@` was interpreted conditionally as in `@ 2<I/foo/$>`. * Even when not causing bugs, a mistyped `@` would often influence the __next__ command, causing unexpected behavior, for instance when typing `@(233C)W`. * While it was theoretically possible to set `:` conditionally, it could also be "passed through" accidentally to some command where it wasn't expected as in `:Ifoo$ C`. I do not know of any real useful application or idiom of a conditionally set `:`. If there would happen to be some kind of useful application, `:'` and `:|` could be re-allowed easily, though. * I was condidering introducing a common parser state for modified commands, but that would have been tricky and introduce a lot of redundant command lists. So instead, we now simply everywhere check for excess modifiers. To simplify this task, teco_machine_main_transition_t now contains flags signaling whether the transition is allowed with `@` or `:` modifiers set. It currently only has to be checked in the start state, after `E` and `F`.
2025-03-17fixed leaking partially built string arguments in case of errorsRobin Haberkorn1-0/+1
* E.g. `@I/foo^EQ%/` whould fail if register `%` is missing. In batch mode, this would currently escalate and terminate the program. Only in this case, memory has been "leaked". This is not critical but was causing false positives in Valgrind. * Also, cleaning up properly might come in handy once we add error-catching constructs to the language.
2025-02-23support mouse interaction with popup windowsRobin Haberkorn1-1/+4
* Curses allows scrolling with the scroll wheel at least if mouse support is enabled via ED flags. Gtk always supported that. * Allow clicking on popup entries to fully autocomplete them. Since this behavior - just like auto completions - is parser state-dependant, I introduced a new state method (insert_completion_cb). All the implementations are currently in cmdline.c since there is some overlap with the process_edit_cmd_cb implementations. * Fixed pressing undefined function keys while showing the popup. The popup area is no longer redrawn/replaced with the Scintilla view. Instead, continue to show the popup.
2025-01-13updated copyright to 2025Robin Haberkorn1-1/+1
2024-12-06support the ::S anchored search (string comparison) command (and ::FD, ::FR, ↵Robin Haberkorn1-5/+6
::FS as well) * The colon modifier can now occur 2 times. Specifying `@` more than once or `:` more than twice is an error now. * Commands do not check for excess colon modifiers - almost every command would have to check it. Instead, a double colon will simply behave like a single colon on most commands. * All search commands inherit the anchored semantics, but it's not very useful in some combinations like -::S, ::N or ::FK. That's why the `::` variants are not documented everywhere. * The lexer.checkheader macro could be simplified and should also be faster now, speeding up startup. Eventually this macro can be made superfluous, e.g. by using 1:FB or 0,1^Q::S.
2024-12-04the <Xq> command now supports the @ modifier for cutting into the registerRobin Haberkorn1-5/+15
* Can be freely combined with the colon-modifier as well. :@Xq cut-appends to register q. * This simply deletes the given buffer range after the copy or append operation as if followed by another <K> command. * This has indeed been a very annoying missing feature, as you often have to retype the range for a K or D command. At the same time, this cannot be reasonably solved with a macro since macros do not accept Q-Register arguments -- so we would have to restrict ourselves to one or a few selected registers. I was also considering to solve this with a special stack operation that duplicates the top values, so that Xq leaves arguments for K, but this couldn't work for cutting lines and would also be longer to type. * It's the first non-string command that accepts @. Others may follow in the future. We're approaching ITS TECO madness levels.
2024-11-23string building: ^c (caret+c) does no longer expand to data garbage for ↵Robin Haberkorn1-0/+9
non-control characters, but to the literal caret, followed by c * For instance `^$` would insert two characters. * The alternative would have been to throw an error.
2024-11-23the search mode and current radix are mapped to __local__ Q-Registers ^X and ↵Robin Haberkorn1-2/+2
^R now (refs #17) * This way the search mode and radix are local to the current macro frame, unless the macro was invoked with :Mq. If colon-modified, you can reproduce the same effect by calling [.^X 0^X ... ].^X * The radix register is cached in the Q-Reg table as an optimization. This could be done with the other "special" registers as well, but at the cost of larger stack frames. * In order to allow constructs like [.^X typed with upcarets, the Q-Register specification syntax has been extended: ^c is the corresponding control code instead of the register "^".
2024-11-07if a macro ends without finding a goto label, always throw a 'Label "..." ↵Robin Haberkorn1-7/+7
not found' error * This is important with gotos in loops as in <@O/x/> where, we would otherwise get a confusing "Unterminated loop" error. * This in particular fixes the error thrown in grosciteco.tes when encountering a new unknown command.
2024-10-30fixup: make sure the correct PCs, pointing directly at the command that ↵Robin Haberkorn1-2/+7
failed, get assigned to error frames
2024-10-30fixed invalid memory access when executing the F< command (but only when ↵Robin Haberkorn1-3/+3
jumping to the beginning of the macro) * I am not sure whether this feature is really that useful... * teco_machine_main_t::macro_pc is now pointing to the __next__ character to execute, therefore it's easier to manipulate by flow control commands. Also, it can now be unsigned (gsize) like all other program counters. * Detected thanks to running the testsuite under Valgrind.
2024-10-16fixup: use teco_machine_t::must_undo instead of trying to identify the ↵Robin Haberkorn1-10/+3
current state machine * The previous solution was not wrong, but unnecessarily complex. We already have a flag for exactly this purpose. * Avoid redundancies by introducing teco_machine_stringbuilding_set_codepage().
2024-10-15fixed memory corruptions due to undoing the ↵Robin Haberkorn1-2/+10
teco_machine_stringbuilding_t::codepage * It's contained in teco_machine_main_t which is created per macro call frame. So after macro calls, the machine no longer exists. It is therefore unsafe to undo its members indiscriminately. * On the other hand, we must undo the codepage setting when run interactively, so it is now only undone when belonging to the commandline macro frame. * This was actually causing memory corruptions on every fnkeys cursor movement, but never caused crashes - probably because the invalid pointers are always pointing to unused parts of the C call stack. * Initially broken in b31b8871.
2024-10-04pattern match characters support ^Q/^R now as wellRobin Haberkorn1-0/+1
* makes it possible, albeit cumbersome, to escape pattern match characters * For instance, to search for ^Q, you now have to type S^Q^Q^Q^Q$. To search for ^E you have to type S^Q^Q^Q^E$. But the last character cannot be typed with carets currently (FIXME?). For pattern-only characters, two ^Q should be sufficient as in S^Q^Q^X$. * Perhaps it would be more elegant to abolish the difference between string building and pattern matching characters to avoid double quoting. But then all string building constructs like ^EQq should operate at the pattern level as well (ie. match the contents of register q verbatim instead of being interpreted as a pattern). TECOC and TECO-64 don't do that either. If we leave everything as it is, at least a new string building construct should be added for auto-quoting patterns (analoguous to ^EN and ^E@).
2024-09-25inhibit some immediate editing commands after ^Q/^R string building constructsRobin Haberkorn1-1/+8
* This allows you to type ^Q^U (which would otherwise rub out the entire argument) and ^Q^W (which would otherwise rub out the ^Q). * ^Q^U coincidentally worked previously since the teco_state_stringbuilding_escaped state would default to teco_state_process_edit_cmd(). But it's better to make this feauture explicit. * This finally makes it possible to insert the ^W (23) char into a buffer. In interactive mode, you can still only type Caret+W as a string building construct. * ^G could also be inhibited after ^Q, but the control char is not used anywhere yet, so there is no point in doing that.
2024-09-20^W^W and ^V^V can be typed completely with upcarets now and they case fold ↵Robin Haberkorn1-23/+97
all expansions of ^EQq, ^EUq and so on * Previously, there was no way to enter upper-case mode in interactive commands since the Ctrl+W immediate editing command is interpreted everywhere. * Without the case folding of ^EQq/^EUq results, the upper and lower case modes are actually pretty useless considering that modern keyboards have caps lock. So it was clear we need this, regardless of what the classic TECOs did. The TECO-11 manual is not very clear on this. tecoc apparently does not case-fold ^EQq results. * This opens up new idioms, for instance `EUq^W^W^EQq$` in order to upper case register q. It's also the only way you can currently upper-case Unicode codepoints.
2024-09-19Ctrl+^ is no longer translated to a single caret in string building (refs #20)Robin Haberkorn1-4/+21
* Ctrl+^ (30) and Caret+caret (^^) were both translated to a single caret. While there might be some reason to keep this behavior for double-caret, it is certainly pointless for Ctrl+^. * That gives you an easy way to insert Ctrl+^ (code 30) into documents with <I>. Perviously, you either had to insert a double-caret, typing 4 carets in a row, or you had to use <EI> or 30I$. * The special handling of double-caret could perhaps be abolished altogether, as we also have ^Q^ to escape plain carets. The double-caret syntax is very archaic from the time that there was no proper ^Q as far as I recall correctly.
2024-09-13remaining types of program counters changed to gsize/gssizeRobin Haberkorn1-6/+5
* This fixes F< to the beginning of the macro, which was broken in 73d574b71a10d4661ada20275cafde75aff6c1ba. teco_machine_main_t::macro_pc actually has to be signed as it is sometimes set to -1.
2024-09-11fixed searches in single-byte encoded documentsRobin Haberkorn1-11/+5
* while code is guaranteed to be in valid UTF-8, this cannot be said about the result of string building. * The search pattern can end up with invalid Unicode bytes even when searching on UTF-8 buffers, e.g. if ^EQq inserts garbage. There are currently no checks. * When searching on a raw buffer, it must be possible to search for arbitrary bytes (^EUq). Since teco_pattern2regexp() was always expecting clean UTF-8 input, this would sometimes skip over too many bytes and could even crash. * Instead, teco_pattern2regexp() now takes the <S> target codepage into account.
2024-09-11the SciTECO parser is Unicode-based now (refs #5)Robin Haberkorn1-52/+105
The following rules apply: * All SciTECO macros __must__ be in valid UTF-8, regardless of the the register's configured encoding. This is checked against before execution, so we can use glib's non-validating UTF-8 API afterwards. * Things will inevitably get slower as we have to validate all macros first and convert to gunichar for each and every character passed into the parser. As an optimization, it may make sense to have our own inlineable version of g_utf8_get_char() (TODO). Also, Unicode glyphs in syntactically significant positions may be case-folded - just like ASCII chars were. This is is of course slower than case folding ASCII. The impact of this should be measured and perhaps we should restrict case folding to a-z via teco_ascii_toupper(). * The language itself does not use any non-ANSI characters, so you don't have to use UTF-8 characters. * Wherever the parser expects a single character, it will now accept an arbitrary Unicode/UTF-8 glyph as well. In other words, you can call macros like M§ instead of having to write M[§]. You can also get the codepoint of any Unicode character with ^^x. Pressing an Unicode character in the start state or in Ex and Fx will now give a sane error message. * When pressing a key which produces a multi-byte UTF-8 sequence, the character gets translated back and forth multiple times: 1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk). On Curses we could directly get a wide char using wget_wch(), but it's not currently used, so we don't depend on widechar curses. 2. Parsed into gunichar for passing into the edit command callbacks. This also validates the codepoint - everything later on can assume valid codepoints and valid UTF-8 strings. 3. Once the edit command handling decides to insert the key into the command line, it is serialized back into an UTF-8 string as the command line macro has to be in UTF-8 (like all other macros). 4. The parser reads back gunichars without validation for passing into the parser callbacks. * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely inserted and displayed UTF-8 sequences, are now fixed.
2024-09-09added raw ANSI mode to facilitate 8-bit clean editing (refs #5)Robin Haberkorn1-2/+2
* When enabled with bit 2 in the ED flags (0,4ED), all registers and buffers will get the raw ANSI encoding (as if 0EE had been called on them). You can still manually change the encoding, eg. by calling 65001EE afterwards. * Also the ANSI mode sets up character representations for all bytes >= 0x80. This is currently done only depending on the ED flag, not when setting 0EE. * Since setting 16,4ED for 8-bit clean editing in a macro can be tricky - the default unnamed buffer will still be at UTF-8 and at least a bunch of environment registers as well - we added the command line option `--8bit` (short `-8`) which configures the ED flags very early on. As another advantage you can mung the profile in 8-bit mode as well when using SciTECO as a sort of interactive hex editor. * Disable UTF-8 checks in 8-bit clean mode (sample.teco_ini).
2024-09-09Xq and ]q inherit the document encoding from the source document (refs #5)Robin Haberkorn1-3/+3
* ^Uq however always sets an UTF8 register as the source is supposed to be a SciTECO macro which is always UTF-8. * :^Uq preserves the register's encoding * teco_doc_set_string() now also sets the encoding * instead of trying to restore the encoding in teco_doc_undo_set_string(), we now swap out the document in a teco_doc_t and pass it to an undo token. * The get_codepage() Q-Reg method has been removed as the same can now be done with teco_doc_get_string() and the get_string() method.
2024-09-09the ^EUq string building escape now respects the encoding (can insert bytes ↵Robin Haberkorn1-6/+31
or codepoints) (refs #5) * This is trickier than it sounds because there isn't one single place to consult. It depends on the context. If the string argument relates to buffer contents - as in <I>, <S>, <FR> etc. - the buffer's encoding is consulted. If it goes into a register (EU), the register's encoding is consulted. Everything else (O, EN, EC, ES...) expects only Unicode codepoints. * This is communicated through a new field teco_machine_stringbuilding_t::codepage which must be set in the states' initial callback. * Seems overkill just for ^EUq, but it can be used for context-sensitive processing of all the other string building constructs as well. * ^V and ^W cannot be supported for Unicode characters for the time being without an Unicode-aware parser
2024-01-21updated copyright to 2024Robin Haberkorn1-1/+1
2023-07-03introduced TECO_DEBUG_CLEANUP to mark destructors that should only be used ↵Robin Haberkorn1-3/+1
for debug builds * There is cleanup that is not strictly necessary, because it only frees memory which is freed on program termination anyway. * However, it helps to explicitly free everything for debugging memory leaks via Valgrind. * The new macro reduces the number of #ifdef statements. * On NDEBUG, the code of these functions will still be eliminated. * If functions are referenced only from the destructor, there will be no unused function warnings, even in NDEBUG.
2023-04-29fixed <EC> interruptions on Gtk+ (and probably on PDCurses/Win32)Robin Haberkorn1-1/+1
* test case: ECwhile true; do true; done$ * Some platforms require polling via teco_interface_is_interrupted() for detecting interruptions, so we added an idle watcher to the Glib event loop in spawn.c. * On platforms that do not require polling key presses (like Unix/ncurses), the idle watcher won't do any harm.
2023-04-18fixup a6b5394086260c262e393dd113057916fd14134b: emit undo tokens for insert_lenRobin Haberkorn1-11/+14
* Turns out it is impossible - or at least very tricky - to avoid undo token emission for insert_len. I therefore opt for stability rather than saving memory. * The old workaround introduced in a6b5394086260c262e393dd113057916fd14134b would actually fail if you do not rub out the string argument completely after interruption. I.e. You type <Ihello^J$>, interrupt - insert_len may be != 0 at this point - and _partially_ rubout the insert-command and continue typing. This could still crash the editor.
2023-04-16fixed interruptions of commands with string arguments in interactive modeRobin Haberkorn1-4/+12
* In order to provoke this bug, there must be a loop with a string command. For instance <Ifoobar^J$>. When interrupting this loop, ctx->expectstring.insert_len might end up > 0. This breaks an optimization that avoids undo tokens for insert_len since it is usually reset to 0 after every keypress. Once you rubout everything and retype `I`, you can crash SciTECO. * I am not sure if this solution is ideal. An alternative might be adding teco_state_expectstring_initial(), but we would have to chain to it from some child states that have their own initial() callback. Of course, we could also simply teco_undo_gsize(insert_len) at the cost of undo tokens.
2023-04-05updated copyright to 2023Robin Haberkorn1-1/+1
2022-11-28fixed a number of crashes due to empty string arguments or uninitialized ↵Robin Haberkorn1-1/+1
registers * An empty but valid teco_string_t can contain NULL pointers. More precisely, a state's done_cb() can be invoked with such empty strings in case of empty string arguments. Also a registers get_string() can return the NULL pointer for existing registers with uninitialized string parts. * In all of these cases, the language should treat "uninitialized" strings exactly like empty strings. * Not doing so, resulted in a number of vulnerabilities. * EN$$ crashed if "_" was uninitialized * The ^E@q and ^ENq string building constructs would crash for existing but uninitialized registers q. * ?$ would crash * ESSETILEXER$$ would crash * This is now fixed. Test cases have been added. * I cannot guarantee that I have found all such cases. Generally, it might be wise to change our definitions and make sure that every teco_string_t must have an associated heap object to be valid. All functions returning pointer+length pairs should consequently also never return NULL pointers.
2022-06-21updated copyright to 2022 and updated TODORobin Haberkorn1-1/+1
2021-06-04guard against too low arguments to <S> by checking whether the memory limit ↵Robin Haberkorn1-1/+5
would be exceeded * Checking whether the allocation succeeded may not prevent exceeding the memory limit excessively. * Even if the memory limit is not exceeded, the allocation can fail theoretically and the program would terminate abnormally. This however is true for all allocations in SciTECO (via glib). * teco_memory_check() therefore now supports checking whether an allocation would exceed the memory limit which will be useful before very large or variable allocations in addition to the regular checking in teco_machine_main_step(). * As a sideeffect, this fixes the "Searching with large counts" test case on Mac OS where too large allocations were not detected as expected (apparently Mac OS happily gives out ridiculously large chunks of memory). Now, all platforms are guaranteed to have the same behaviour.
2021-05-30THE GREAT CEEIFICATION EVENTRobin Haberkorn1-0/+902
This is a total conversion of SciTECO to plain C (GNU C11). The chance was taken to improve a lot of internal datastructures, fix fundamental bugs and lay the foundations of future features. The GTK user interface is now in an useable state! All changes have been squashed together. The language itself has almost not changed at all, except for: * Detection of string terminators (usually Escape) now takes the string building characters into account. A string is only terminated outside of string building characters. In other words, you can now for instance write I^EQ[Hello$world]$ This removes one of the last bits of shellisms which is out of place in SciTECO where no tokenization/lexing is performed. Consequently, the current termination character can also be escaped using ^Q/^R. This is used by auto completions to make sure that strings are inserted verbatim and without unwanted sideeffects. * All strings can now safely contain null-characters (see also: 8-bit cleanliness). The null-character itself (^@) is not (yet) a valid SciTECO command, though. An incomplete list of changes: * We got rid of the BSD headers for RB trees and lists/queues. The problem with them was that they used a form of metaprogramming only to gain a bit of type safety. It also resulted in less readble code. This was a C++ desease. The new code avoids metaprogramming only to gain type safety. The BSD tree.h has been replaced by rb3ptr by Jens Stimpfle (https://github.com/jstimpfle/rb3ptr). This implementation is also more memory efficient than BSD's. The BSD list.h and queue.h has been replaced with a custom src/list.h. * Fixed crashes, performance issues and compatibility issues with the Gtk 3 User Interface. It is now more or less ready for general use. The GDK lock is no longer used to avoid using deprecated functions. On the downside, the new implementation (driving the Gtk event loop stepwise) is even slower than the old one. A few glitches remain (see TODO), but it is hoped that they will be resolved by the Scintilla update which will be performed soon. * A lot of program units have been split up, so they are shorter and easier to maintain: core-commands.c, qreg-commands.c, goto-commands.c, file-utils.h. * Parser states are simply structs of callbacks now. They still use a kind of polymorphy using a preprocessor trick. TECO_DEFINE_STATE() takes an initializer list that will be merged with the default list of field initializers. To "subclass" states, you can simply define new macros that add initializers to existing macros. * Parsers no longer have a "transitions" table but the input_cb() may use switch-case statements. There are also teco_machine_main_transition_t now which can be used to implement simple transitions. Additionally, you can specify functions to execute during transitions. This largely avoids long switch-case-statements. * Parsers are embeddable/reusable now, at least in parse-only mode. This does not currently bring any advantages but may later be used to write a Scintilla lexer for TECO syntax highlighting. Once parsers are fully embeddable, it will also be possible to run TECO macros in a kind of coroutine which would allow them to process string arguments in real time. * undo.[ch] still uses metaprogramming extensively but via the C preprocessor of course. On the downside, most undo token generators must be initiated explicitly (theoretically we could have used embedded functions / trampolines to instantiate automatically but this has turned out to be dangereous). There is a TECO_DEFINE_UNDO_CALL() to generate closures for arbitrary functions now (ie. to call an arbitrary function at undo-time). This simplified a lot of code and is much shorter than manually pushing undo tokens in many cases. * Instead of the ridiculous C++ Curiously Recurring Template Pattern to achieve static polymorphy for user interface implementations, we now simply declare all functions to implement in interface.h and link in the implementations. This is possible since we no longer hace to define interface subclasses (all state is static variables in the interface's *.c files). * Headers are now significantly shorter than in C++ since we can often hide more of our "class" implementations. * Memory counting is based on dlmalloc for most platforms now. Unfortunately, there is no malloc implementation that provides an efficient constant-time memory counter that is guaranteed to decrease when freeing memory. But since we use a defined malloc implementation now, malloc_usable_size() can be used safely for tracking memory use. malloc() replacement is very tricky on Windows, so we use a poll thread on Windows. This can also be enabled on other supported platforms using --disable-malloc-replacement. All in all, I'm still not pleased with the state of memory limiting. It is a mess. * Error handling uses GError now. This has the advantage that the GError codes can be reused once we support error catching in the SciTECO language. * Added a few more test suite cases. * Haiku is no longer supported as builds are instable and I did not manage to debug them - quite possibly Haiku bugs were responsible. * Glib v2.44 or later are now required. The GTK UI requires Gtk+ v3.12 or later now. The GtkFlowBox fallback and sciteco-wrapper workaround are no longer required. * We now extensively use the GCC/Clang-specific g_auto feature (automatic deallocations when leaving the current code block). * Updated copyright to 2021. SciTECO has been in continuous development, even though there have been no commits since 2018. * Since these changes are so significant, the target release has been set to v2.0. It is planned that beginning with v3.0, the language will be kept stable.