<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sciteco/src/string-utils.c, branch v2.5.2</title>
<subtitle>Scintilla-based Text Editor and COrrector</subtitle>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/'/>
<entry>
<title>fixed auto-completion of Unicode file names</title>
<updated>2026-01-18T20:36:56+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>rhaberkorn@fmsbw.de</email>
</author>
<published>2026-01-18T20:36:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=3deb2b3970cb344d0c615e9d41a5fe7f3baf1417'/>
<id>3deb2b3970cb344d0c615e9d41a5fe7f3baf1417</id>
<content type='text'>
* teco_string_diff() could return a number of bytes in the middle of
  an Unicode sequence. It now also requires Unicode strings.
* Added a missing Unicode-validity check when replacing command lines (`{` and `}`).
  teco_cmdline_insert() should really be refactored, though (FIXME).
* Added test case
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* teco_string_diff() could return a number of bytes in the middle of
  an Unicode sequence. It now also requires Unicode strings.
* Added a missing Unicode-validity check when replacing command lines (`{` and `}`).
  teco_cmdline_insert() should really be refactored, though (FIXME).
* Added test case
</pre>
</div>
</content>
</entry>
<entry>
<title>updated copyright to 2026</title>
<updated>2026-01-01T06:59:49+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>rhaberkorn@fmsbw.de</email>
</author>
<published>2026-01-01T06:59:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=c2feb2a6f71fc9adb20226fb3c2260c236e974e0'/>
<id>c2feb2a6f71fc9adb20226fb3c2260c236e974e0</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>teco_string_t is now passed by value like a scalar if the callee isn't expected to modify it</title>
<updated>2025-12-28T19:57:31+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>rhaberkorn@fmsbw.de</email>
</author>
<published>2025-12-28T15:23:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=ea0a23645f03a42252ab1ce8df45ae4076ebae75'/>
<id>ea0a23645f03a42252ab1ce8df45ae4076ebae75</id>
<content type='text'>
* When passing a struct that should not be modified, I usually use a const pointer.
* Strings however are small 2-word objects and they are often now already passed via separate
  `gchar*` and gsize parameters. So it is consistent to pass teco_string_t by value as well.
  A teco_string_t will usually fit into registers just like a pointer.
* It's now obvious which function just _uses_ and which function _modifies_ a string.
  There is also no chance to pass a NULL pointer to those functions.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* When passing a struct that should not be modified, I usually use a const pointer.
* Strings however are small 2-word objects and they are often now already passed via separate
  `gchar*` and gsize parameters. So it is consistent to pass teco_string_t by value as well.
  A teco_string_t will usually fit into registers just like a pointer.
* It's now obvious which function just _uses_ and which function _modifies_ a string.
  There is also no chance to pass a NULL pointer to those functions.
</pre>
</div>
</content>
</entry>
<entry>
<title>updated copyright to 2025</title>
<updated>2025-01-12T23:39:34+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2025-01-12T23:39:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=d842eaee19e2723f845d4b8314a230cf68e82653'/>
<id>d842eaee19e2723f845d4b8314a230cf68e82653</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>teco_string_get_coord() returns character offsets now (refs #5)</title>
<updated>2024-09-12T14:42:08+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-12T11:33:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=73d574b71a10d4661ada20275cafde75aff6c1ba'/>
<id>73d574b71a10d4661ada20275cafde75aff6c1ba</id>
<content type='text'>
* This is used for error messages (TECO macro stackframes),
  so it's important to display columns in characters.
* Program counters are in bytes and therefore everywhere gsize.
  This is by glib convention.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* This is used for error messages (TECO macro stackframes),
  so it's important to display columns in characters.
* Program counters are in bytes and therefore everywhere gsize.
  This is by glib convention.
</pre>
</div>
</content>
</entry>
<entry>
<title>the SciTECO parser is Unicode-based now (refs #5)</title>
<updated>2024-09-11T14:14:27+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-11T10:21:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=68578072bfaf6054a96bb6bcedfccb6e56a508fe'/>
<id>68578072bfaf6054a96bb6bcedfccb6e56a508fe</id>
<content type='text'>
The following rules apply:
 * All SciTECO macros __must__ be in valid UTF-8, regardless of the
   the register's configured encoding.
   This is checked against before execution, so we can use glib's non-validating
   UTF-8 API afterwards.
 * Things will inevitably get slower as we have to validate all macros first
   and convert to gunichar for each and every character passed into the parser.
   As an optimization, it may make sense to have our own inlineable version of
   g_utf8_get_char() (TODO).
   Also, Unicode glyphs in syntactically significant positions may be case-folded -
   just like ASCII chars were. This is is of course slower than case folding
   ASCII. The impact of this should be measured and perhaps we should restrict
   case folding to a-z via teco_ascii_toupper().
 * The language itself does not use any non-ANSI characters, so you don't have to
   use UTF-8 characters.
 * Wherever the parser expects a single character, it will now accept an arbitrary
   Unicode/UTF-8 glyph as well.
   In other words, you can call macros like M§ instead of having to write M[§].
   You can also get the codepoint of any Unicode character with ^^x.
   Pressing an Unicode character in the start state or in Ex and Fx will now
   give a sane error message.
 * When pressing a key which produces a multi-byte UTF-8 sequence, the character
   gets translated back and forth multiple times:
   1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
      On Curses we could directly get a wide char using wget_wch(), but it's
      not currently used, so we don't depend on widechar curses.
   2. Parsed into gunichar for passing into the edit command callbacks.
      This also validates the codepoint - everything later on can assume valid
      codepoints and valid UTF-8 strings.
   3. Once the edit command handling decides to insert the key into the command line,
      it is serialized back into an UTF-8 string as the command line macro has
      to be in UTF-8 (like all other macros).
   4. The parser reads back gunichars without validation for passing into
      the parser callbacks.
 * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
   inserted and displayed UTF-8 sequences, are now fixed.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following rules apply:
 * All SciTECO macros __must__ be in valid UTF-8, regardless of the
   the register's configured encoding.
   This is checked against before execution, so we can use glib's non-validating
   UTF-8 API afterwards.
 * Things will inevitably get slower as we have to validate all macros first
   and convert to gunichar for each and every character passed into the parser.
   As an optimization, it may make sense to have our own inlineable version of
   g_utf8_get_char() (TODO).
   Also, Unicode glyphs in syntactically significant positions may be case-folded -
   just like ASCII chars were. This is is of course slower than case folding
   ASCII. The impact of this should be measured and perhaps we should restrict
   case folding to a-z via teco_ascii_toupper().
 * The language itself does not use any non-ANSI characters, so you don't have to
   use UTF-8 characters.
 * Wherever the parser expects a single character, it will now accept an arbitrary
   Unicode/UTF-8 glyph as well.
   In other words, you can call macros like M§ instead of having to write M[§].
   You can also get the codepoint of any Unicode character with ^^x.
   Pressing an Unicode character in the start state or in Ex and Fx will now
   give a sane error message.
 * When pressing a key which produces a multi-byte UTF-8 sequence, the character
   gets translated back and forth multiple times:
   1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
      On Curses we could directly get a wide char using wget_wch(), but it's
      not currently used, so we don't depend on widechar curses.
   2. Parsed into gunichar for passing into the edit command callbacks.
      This also validates the codepoint - everything later on can assume valid
      codepoints and valid UTF-8 strings.
   3. Once the edit command handling decides to insert the key into the command line,
      it is serialized back into an UTF-8 string as the command line macro has
      to be in UTF-8 (like all other macros).
   4. The parser reads back gunichars without validation for passing into
      the parser callbacks.
 * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
   inserted and displayed UTF-8 sequences, are now fixed.
</pre>
</div>
</content>
</entry>
<entry>
<title>leave some comments on what to do when converting the parser to Unicode (refs #5)</title>
<updated>2024-09-09T16:22:21+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-04T16:26:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=893a0a6ad85411a57c1225af03260b34561377c7'/>
<id>893a0a6ad85411a57c1225af03260b34561377c7</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>updated copyright to 2024</title>
<updated>2024-01-21T11:45:05+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-01-21T11:07:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=1cecf04656532e94e1fe9fe25460774324b2197c'/>
<id>1cecf04656532e94e1fe9fe25460774324b2197c</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>updated copyright to 2023</title>
<updated>2023-04-05T15:11:32+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2023-04-05T15:11:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=0f7424e9b041646d782fb4c8b2f81a2f74856731'/>
<id>0f7424e9b041646d782fb4c8b2f81a2f74856731</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>updated copyright to 2022 and updated TODO</title>
<updated>2022-06-21T01:41:16+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2022-06-21T01:41:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=70619bb88918e9cd057dbbc6a87e890cbce49a08'/>
<id>70619bb88918e9cd057dbbc6a87e890cbce49a08</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
