<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sciteco/src/parser.c, branch v2.5.2</title>
<subtitle>Scintilla-based Text Editor and COrrector</subtitle>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/'/>
<entry>
<title>updated copyright to 2026</title>
<updated>2026-01-01T06:59:49+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>rhaberkorn@fmsbw.de</email>
</author>
<published>2026-01-01T06:59:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=c2feb2a6f71fc9adb20226fb3c2260c236e974e0'/>
<id>c2feb2a6f71fc9adb20226fb3c2260c236e974e0</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>teco_string_t is now passed by value like a scalar if the callee isn't expected to modify it</title>
<updated>2025-12-28T19:57:31+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>rhaberkorn@fmsbw.de</email>
</author>
<published>2025-12-28T15:23:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=ea0a23645f03a42252ab1ce8df45ae4076ebae75'/>
<id>ea0a23645f03a42252ab1ce8df45ae4076ebae75</id>
<content type='text'>
* When passing a struct that should not be modified, I usually use a const pointer.
* Strings however are small 2-word objects and they are often now already passed via separate
  `gchar*` and gsize parameters. So it is consistent to pass teco_string_t by value as well.
  A teco_string_t will usually fit into registers just like a pointer.
* It's now obvious which function just _uses_ and which function _modifies_ a string.
  There is also no chance to pass a NULL pointer to those functions.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* When passing a struct that should not be modified, I usually use a const pointer.
* Strings however are small 2-word objects and they are often now already passed via separate
  `gchar*` and gsize parameters. So it is consistent to pass teco_string_t by value as well.
  A teco_string_t will usually fit into registers just like a pointer.
* It's now obvious which function just _uses_ and which function _modifies_ a string.
  There is also no chance to pass a NULL pointer to those functions.
</pre>
</div>
</content>
</entry>
<entry>
<title>TECO_DEFINE_STATE() no longer constructs callback names for mandatory callbacks, but tries to use static assertions</title>
<updated>2025-12-26T17:10:42+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>rhaberkorn@fmsbw.de</email>
</author>
<published>2025-12-26T17:10:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=c2114fa0af73b42bc1ef302f7511ef87690cc0b1'/>
<id>c2114fa0af73b42bc1ef302f7511ef87690cc0b1</id>
<content type='text'>
* Requiring state callbacks by generating their names (e.g. NAME##_input) has several disadvantages:
  * The callback is not explicitly referenced when the state is defined.
    So an unintroduced reader will see some static function, which is nowhere referenced and still
    doesn't cause "unused" warnings.
  * You cannot choose the name of function that implements the callback freely.
  * In "substates" you need to generate a callback function if you want to provide a default.
    You also need to provide dummy wrapper functions whenever you want to reuse some existing
    function as the implementation.
* Instead, we are now using static assertions to check whether certain callbacks have been
  implemented.
  Unfortunately, this does not work on all compilers. In particular GCC won't consider
  references to state objects fully constant (even though they are) and does not allow
  them in _Static_assert (G_STATIC_ASSERT). This could only be made to work in newer GCC
  with -std=c2x or -std=gnu23 in combination with constexpr.
  It does work on Clang, though.
  So I introduced TECO_ASSERT_SAFE() which also passes if the expression is *not* constant.
  These static assertions are not crucial - they do not check anything that can differ between
  systems. So we can always rely on the checks performed by FreeBSD CI for instance.
  Also, you will of course quickly notice missing callbacks at runtime - with and without
  additional runtime assertions.
* All mandatory callbacks must still be explicitly initialized in the TECO_DEFINE_STATE calls.
* After getting rid of generated callback implementations, the TECO_DEFINE_STATE macros
  can finally be qualified with `static`.
* The TECO_DECLARE_STATE() macro has been removed. It no longer abstracts anything
  and cannot be used to declare static teco_state_t anyway.
  Also TECO_DEFINE_UNDO_CALL() also doesn't have a DECLARE counterpart.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Requiring state callbacks by generating their names (e.g. NAME##_input) has several disadvantages:
  * The callback is not explicitly referenced when the state is defined.
    So an unintroduced reader will see some static function, which is nowhere referenced and still
    doesn't cause "unused" warnings.
  * You cannot choose the name of function that implements the callback freely.
  * In "substates" you need to generate a callback function if you want to provide a default.
    You also need to provide dummy wrapper functions whenever you want to reuse some existing
    function as the implementation.
* Instead, we are now using static assertions to check whether certain callbacks have been
  implemented.
  Unfortunately, this does not work on all compilers. In particular GCC won't consider
  references to state objects fully constant (even though they are) and does not allow
  them in _Static_assert (G_STATIC_ASSERT). This could only be made to work in newer GCC
  with -std=c2x or -std=gnu23 in combination with constexpr.
  It does work on Clang, though.
  So I introduced TECO_ASSERT_SAFE() which also passes if the expression is *not* constant.
  These static assertions are not crucial - they do not check anything that can differ between
  systems. So we can always rely on the checks performed by FreeBSD CI for instance.
  Also, you will of course quickly notice missing callbacks at runtime - with and without
  additional runtime assertions.
* All mandatory callbacks must still be explicitly initialized in the TECO_DEFINE_STATE calls.
* After getting rid of generated callback implementations, the TECO_DEFINE_STATE macros
  can finally be qualified with `static`.
* The TECO_DECLARE_STATE() macro has been removed. It no longer abstracts anything
  and cannot be used to declare static teco_state_t anyway.
  Also TECO_DEFINE_UNDO_CALL() also doesn't have a DECLARE counterpart.
</pre>
</div>
</content>
</entry>
<entry>
<title>support &lt;:O&gt;: if a label is not found, continue execution after the go-to statement</title>
<updated>2025-08-30T23:24:11+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-08-30T23:24:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=9425ad37ec95a40dc039169031259161c92cc217'/>
<id>9425ad37ec95a40dc039169031259161c92cc217</id>
<content type='text'>
* this is a SciTECO extension - it's not in TECO-11
* Allows for select-case-like constructs with default-clauses as in
  :Os.^EQa$
    !* default *!
  !s.foo!
    !* ... *!
  !s.bar!
    !* ... *!
* Consistent with nOlabel0,label1,...$ if &lt;n&gt; is out of range.
  Unfortunately this form of computed goto is not applicable when
  "selecting" by strings or non-consecutive integers.
* In order to continue after the &lt;:O&gt; statement, we must keep the
  program counter along with the label we were looking for.
  At the end of the macro, the PC is restored instead of throwing
  an error.
* Since that would be very inefficient in loops - where potentially
  all iterations would result in rescanning till the end of the
  macro - we now store a completed-flag in the goto table.
  If it is set while trying to :O to an unknown label, we can
  just continue execution.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* this is a SciTECO extension - it's not in TECO-11
* Allows for select-case-like constructs with default-clauses as in
  :Os.^EQa$
    !* default *!
  !s.foo!
    !* ... *!
  !s.bar!
    !* ... *!
* Consistent with nOlabel0,label1,...$ if &lt;n&gt; is out of range.
  Unfortunately this form of computed goto is not applicable when
  "selecting" by strings or non-consecutive integers.
* In order to continue after the &lt;:O&gt; statement, we must keep the
  program counter along with the label we were looking for.
  At the end of the macro, the PC is restored instead of throwing
  an error.
* Since that would be very inefficient in loops - where potentially
  all iterations would result in rescanning till the end of the
  macro - we now store a completed-flag in the goto table.
  If it is set while trying to :O to an unknown label, we can
  just continue execution.
</pre>
</div>
</content>
</entry>
<entry>
<title>fixed serious bug with certain alternative string termination chars in commands with multiple string arguments</title>
<updated>2025-08-02T10:16:16+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-08-02T10:16:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=e46352bc614cf9777ca76deb47330fb408bc1a23'/>
<id>e46352bc614cf9777ca76deb47330fb408bc1a23</id>
<content type='text'>
* When `@`-modifying a command with several string arguments and choosing `{` as the alternative
  string termination character, the parser would get totally confused.
  Any sequence of `{` would be ignored and only the first non-`{` would become the termination character.
  Consequently you also couldn't choose a new terminator after the closing `}`.
  So even a documented code example from sciteco(7) wouldn't work.
  The same was true when using $ (escape) or ^A as the alternative termination character.
* We can now correctly parse e.g. `@FR{foo}{bar}` or `@FR$foo$bar$` (even though the
  latter one is quite pointless).
* has probably been broken forever (has been broken even before v2.0).
* Whitespace is now ignored in front of alternative termination characters as in TECO-64, so
  we can also write `@S /foo/` or even
  ```
  @^Um
  {
    !* blabla *!
  }
  ```
  I wanted to disallow whitespace termination characters, so the alternative would have been
  to throw an error.
  The new implementation at least adds some functionality.
  * Avoid redundancies when parsing no-op characters via teco_is_noop().
    I assume that this is inlined and drawn into any jump-table what would be
    generated for the switch-statement in teco_state_start_input().
 * Alternative termination characters are still case-folded, even if they are Unicode glyphs,
   so `@IЖfooж` would work and insert `foo`.
   This should perhaps be restricted to ANSI characters?
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* When `@`-modifying a command with several string arguments and choosing `{` as the alternative
  string termination character, the parser would get totally confused.
  Any sequence of `{` would be ignored and only the first non-`{` would become the termination character.
  Consequently you also couldn't choose a new terminator after the closing `}`.
  So even a documented code example from sciteco(7) wouldn't work.
  The same was true when using $ (escape) or ^A as the alternative termination character.
* We can now correctly parse e.g. `@FR{foo}{bar}` or `@FR$foo$bar$` (even though the
  latter one is quite pointless).
* has probably been broken forever (has been broken even before v2.0).
* Whitespace is now ignored in front of alternative termination characters as in TECO-64, so
  we can also write `@S /foo/` or even
  ```
  @^Um
  {
    !* blabla *!
  }
  ```
  I wanted to disallow whitespace termination characters, so the alternative would have been
  to throw an error.
  The new implementation at least adds some functionality.
  * Avoid redundancies when parsing no-op characters via teco_is_noop().
    I assume that this is inlined and drawn into any jump-table what would be
    generated for the switch-statement in teco_state_start_input().
 * Alternative termination characters are still case-folded, even if they are Unicode glyphs,
   so `@IЖfooж` would work and insert `foo`.
   This should perhaps be restricted to ANSI characters?
</pre>
</div>
</content>
</entry>
<entry>
<title>implemented the &lt;^A&gt; command for printing arbitrary strings</title>
<updated>2025-07-25T21:42:15+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-07-24T22:41:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=eee669a76b3c0b1928475d55d9e1333b3d15bb8c'/>
<id>eee669a76b3c0b1928475d55d9e1333b3d15bb8c</id>
<content type='text'>
* Greatly improved usability as a scripting language.
* The command is in DEC TECO, but in contrast to DEC TECO, we also
  support string building constructs in ^A.
* Required some refactoring: As we want it to write everything verbatim
  to stdout, the per-interface method is now teco_interface_msg_literal()
  and it has to deal with unprintable characters.
  When displaying in the UI, we use teco_curses_format_str() and TecoGtkLabel
  functions/widgets to deal with possible control codes.
* Numbers printed with `=` have to be written with a trailing linefeed,
  which would also be visible as a reverse "LF" in the UI.
  Not sure whether this is acceptable - the alternative would be to strip
  the strings before displaying them.
* Messages written to stdout are also auto-flushed at the moment.
  In the future we might want to put flushing under control of the language.
  Perhaps :^A could inhibit the flushing.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Greatly improved usability as a scripting language.
* The command is in DEC TECO, but in contrast to DEC TECO, we also
  support string building constructs in ^A.
* Required some refactoring: As we want it to write everything verbatim
  to stdout, the per-interface method is now teco_interface_msg_literal()
  and it has to deal with unprintable characters.
  When displaying in the UI, we use teco_curses_format_str() and TecoGtkLabel
  functions/widgets to deal with possible control codes.
* Numbers printed with `=` have to be written with a trailing linefeed,
  which would also be visible as a reverse "LF" in the UI.
  Not sure whether this is acceptable - the alternative would be to strip
  the strings before displaying them.
* Messages written to stdout are also auto-flushed at the moment.
  In the future we might want to put flushing under control of the language.
  Perhaps :^A could inhibit the flushing.
</pre>
</div>
</content>
</entry>
<entry>
<title>fixed minor memory leaks of per-state data in teco_machine_main_t</title>
<updated>2025-07-17T21:34:56+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-07-17T21:34:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=3a2583e918bcc805fe860252f8a520fc2f9b26ce'/>
<id>3a2583e918bcc805fe860252f8a520fc2f9b26ce</id>
<content type='text'>
* These were leaked e.g. in case of end-of-macro errors,
  but also in case of syntax highlighting (teco_lexer_style()).
  I considered to solve this by overwriting more of the end_of_macro_cb,
  but it didn't turn out to be trivial always.
* Considering that the union in teco_machine_main_t saved only 3 machine words
  of memory, I decided to sacrifice those for more robust memory management.
* teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t
  due to recursive dependencies with teco_machine_stringbuilding_t.
  It could now and should perhaps be allocated only once in teco_machine_main_init(),
  but it would require more refactoring.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* These were leaked e.g. in case of end-of-macro errors,
  but also in case of syntax highlighting (teco_lexer_style()).
  I considered to solve this by overwriting more of the end_of_macro_cb,
  but it didn't turn out to be trivial always.
* Considering that the union in teco_machine_main_t saved only 3 machine words
  of memory, I decided to sacrifice those for more robust memory management.
* teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t
  due to recursive dependencies with teco_machine_stringbuilding_t.
  It could now and should perhaps be allocated only once in teco_machine_main_init(),
  but it would require more refactoring.
</pre>
</div>
</content>
</entry>
<entry>
<title>minor documentation fix in parser.c</title>
<updated>2025-07-12T21:35:11+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-07-12T21:35:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=078c1927cffc6514168566c267151a8d6eca7367'/>
<id>078c1927cffc6514168566c267151a8d6eca7367</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>implemented ^E&lt;code&gt; string building constructs for embedding bytes and codepoints in a strtoul()-like manner</title>
<updated>2025-07-03T14:04:01+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-07-03T13:21:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=7bc7662f3cd1ceaf55e00f3d5f84e9772574afc8'/>
<id>7bc7662f3cd1ceaf55e00f3d5f84e9772574afc8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>new string building construct ^P disables all further string building magic</title>
<updated>2025-05-23T22:31:22+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-05-23T22:31:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=4b266c9616f4eb359be71c44b9b2fa3373265bb0'/>
<id>4b266c9616f4eb359be71c44b9b2fa3373265bb0</id>
<content type='text'>
* Now, `I^P` can replace `EI`.
  EI is therefore now free to be repurposed as the new "mung file" command for improved TECO-11 compatibility.
* On the downside when inserting large blocks of TECO code, you will have to write something like
  `@I{^P !...! }`
* The construct is also useful when searching for carets as in `S^P^Q^`.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Now, `I^P` can replace `EI`.
  EI is therefore now free to be repurposed as the new "mung file" command for improved TECO-11 compatibility.
* On the downside when inserting large blocks of TECO code, you will have to write something like
  `@I{^P !...! }`
* The construct is also useful when searching for carets as in `S^P^Q^`.
</pre>
</div>
</content>
</entry>
</feed>
