<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sciteco/tests, branch hsrex</title>
<subtitle>Scintilla-based Text Editor and COrrector</subtitle>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/'/>
<entry>
<title>remaining types of program counters changed to gsize/gssize</title>
<updated>2024-09-12T23:49:22+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-12T23:31:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=a9224ebee3b6458dee42d76ec76b1a704e206107'/>
<id>a9224ebee3b6458dee42d76ec76b1a704e206107</id>
<content type='text'>
* This fixes F&lt; to the beginning of the macro, which was broken in 73d574b71a10d4661ada20275cafde75aff6c1ba.
  teco_machine_main_t::macro_pc actually has to be signed as it is sometimes set to -1.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* This fixes F&lt; to the beginning of the macro, which was broken in 73d574b71a10d4661ada20275cafde75aff6c1ba.
  teco_machine_main_t::macro_pc actually has to be signed as it is sometimes set to -1.
</pre>
</div>
</content>
</entry>
<entry>
<title>fixed searches in single-byte encoded documents</title>
<updated>2024-09-11T14:14:27+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-11T12:30:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=2a050759ab621b87d0782cc8235907a1757b46cc'/>
<id>2a050759ab621b87d0782cc8235907a1757b46cc</id>
<content type='text'>
* while code is guaranteed to be in valid UTF-8, this cannot be
  said about the result of string building.
* The search pattern can end up with invalid Unicode bytes even when
  searching on UTF-8 buffers, e.g. if ^EQq inserts garbage.
  There are currently no checks.
* When searching on a raw buffer, it must be possible to
  search for arbitrary bytes (^EUq).
  Since teco_pattern2regexp() was always expecting clean UTF-8 input,
  this would sometimes skip over too many bytes and could even crash.
* Instead, teco_pattern2regexp() now takes the &lt;S&gt; target codepage
  into account.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* while code is guaranteed to be in valid UTF-8, this cannot be
  said about the result of string building.
* The search pattern can end up with invalid Unicode bytes even when
  searching on UTF-8 buffers, e.g. if ^EQq inserts garbage.
  There are currently no checks.
* When searching on a raw buffer, it must be possible to
  search for arbitrary bytes (^EUq).
  Since teco_pattern2regexp() was always expecting clean UTF-8 input,
  this would sometimes skip over too many bytes and could even crash.
* Instead, teco_pattern2regexp() now takes the &lt;S&gt; target codepage
  into account.
</pre>
</div>
</content>
</entry>
<entry>
<title>the SciTECO parser is Unicode-based now (refs #5)</title>
<updated>2024-09-11T14:14:27+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-11T10:21:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=68578072bfaf6054a96bb6bcedfccb6e56a508fe'/>
<id>68578072bfaf6054a96bb6bcedfccb6e56a508fe</id>
<content type='text'>
The following rules apply:
 * All SciTECO macros __must__ be in valid UTF-8, regardless of the
   the register's configured encoding.
   This is checked against before execution, so we can use glib's non-validating
   UTF-8 API afterwards.
 * Things will inevitably get slower as we have to validate all macros first
   and convert to gunichar for each and every character passed into the parser.
   As an optimization, it may make sense to have our own inlineable version of
   g_utf8_get_char() (TODO).
   Also, Unicode glyphs in syntactically significant positions may be case-folded -
   just like ASCII chars were. This is is of course slower than case folding
   ASCII. The impact of this should be measured and perhaps we should restrict
   case folding to a-z via teco_ascii_toupper().
 * The language itself does not use any non-ANSI characters, so you don't have to
   use UTF-8 characters.
 * Wherever the parser expects a single character, it will now accept an arbitrary
   Unicode/UTF-8 glyph as well.
   In other words, you can call macros like M§ instead of having to write M[§].
   You can also get the codepoint of any Unicode character with ^^x.
   Pressing an Unicode character in the start state or in Ex and Fx will now
   give a sane error message.
 * When pressing a key which produces a multi-byte UTF-8 sequence, the character
   gets translated back and forth multiple times:
   1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
      On Curses we could directly get a wide char using wget_wch(), but it's
      not currently used, so we don't depend on widechar curses.
   2. Parsed into gunichar for passing into the edit command callbacks.
      This also validates the codepoint - everything later on can assume valid
      codepoints and valid UTF-8 strings.
   3. Once the edit command handling decides to insert the key into the command line,
      it is serialized back into an UTF-8 string as the command line macro has
      to be in UTF-8 (like all other macros).
   4. The parser reads back gunichars without validation for passing into
      the parser callbacks.
 * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
   inserted and displayed UTF-8 sequences, are now fixed.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following rules apply:
 * All SciTECO macros __must__ be in valid UTF-8, regardless of the
   the register's configured encoding.
   This is checked against before execution, so we can use glib's non-validating
   UTF-8 API afterwards.
 * Things will inevitably get slower as we have to validate all macros first
   and convert to gunichar for each and every character passed into the parser.
   As an optimization, it may make sense to have our own inlineable version of
   g_utf8_get_char() (TODO).
   Also, Unicode glyphs in syntactically significant positions may be case-folded -
   just like ASCII chars were. This is is of course slower than case folding
   ASCII. The impact of this should be measured and perhaps we should restrict
   case folding to a-z via teco_ascii_toupper().
 * The language itself does not use any non-ANSI characters, so you don't have to
   use UTF-8 characters.
 * Wherever the parser expects a single character, it will now accept an arbitrary
   Unicode/UTF-8 glyph as well.
   In other words, you can call macros like M§ instead of having to write M[§].
   You can also get the codepoint of any Unicode character with ^^x.
   Pressing an Unicode character in the start state or in Ex and Fx will now
   give a sane error message.
 * When pressing a key which produces a multi-byte UTF-8 sequence, the character
   gets translated back and forth multiple times:
   1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
      On Curses we could directly get a wide char using wget_wch(), but it's
      not currently used, so we don't depend on widechar curses.
   2. Parsed into gunichar for passing into the edit command callbacks.
      This also validates the codepoint - everything later on can assume valid
      codepoints and valid UTF-8 strings.
   3. Once the edit command handling decides to insert the key into the command line,
      it is serialized back into an UTF-8 string as the command line macro has
      to be in UTF-8 (like all other macros).
   4. The parser reads back gunichars without validation for passing into
      the parser callbacks.
 * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
   inserted and displayed UTF-8 sequences, are now fixed.
</pre>
</div>
</content>
</entry>
<entry>
<title>fixed win32 CI and nightly builds (refs #5)</title>
<updated>2024-09-10T10:13:38+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-09T21:22:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=adc067ba745cebf2e2a2f9523bc14136ca1d2680'/>
<id>adc067ba745cebf2e2a2f9523bc14136ca1d2680</id>
<content type='text'>
* The libtool wrapper binaries do not pass down UTF-8 strings correctly,
  so the Unicode tests failed under some circumstances.
* As we aren't actually linking against any locally-built shared libraries,
  we are passing --disable-shared to libtool which inhibts wrapper generation
  on win32 and fixes the test suite.
* Also use up to date autotools. This didn't fix anything, though.
* test suite: try writing an Unicode filename as well
  * There have been problems doing that on Win32 where UTF-8 was not
    correctly passed down from the command line and some Windows API
    calls were only working with ANSI filenames etc.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* The libtool wrapper binaries do not pass down UTF-8 strings correctly,
  so the Unicode tests failed under some circumstances.
* As we aren't actually linking against any locally-built shared libraries,
  we are passing --disable-shared to libtool which inhibts wrapper generation
  on win32 and fixes the test suite.
* Also use up to date autotools. This didn't fix anything, though.
* test suite: try writing an Unicode filename as well
  * There have been problems doing that on Win32 where UTF-8 was not
    correctly passed down from the command line and some Windows API
    calls were only working with ANSI filenames etc.
</pre>
</div>
</content>
</entry>
<entry>
<title>try a different value for LC_ALL on Mac OS to accept UTF-8 command lines (refs #5)</title>
<updated>2024-09-09T18:28:14+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-09T18:28:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=c59b33c85e58e78bf2c796dfe19c91e8f9427936'/>
<id>c59b33c85e58e78bf2c796dfe19c91e8f9427936</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>testsuite: try different locale on Mac OS (refs #5)</title>
<updated>2024-09-09T18:05:16+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-09T18:05:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=4789e39a8e5e1245617a559eddd23b66400f4b34'/>
<id>4789e39a8e5e1245617a559eddd23b66400f4b34</id>
<content type='text'>
hopefully fixes the Unicode test cases on Mac OS
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
hopefully fixes the Unicode test cases on Mac OS
</pre>
</div>
</content>
</entry>
<entry>
<title>improved 8-bit cleanliness test cases and added Unicode test cases (refs #5)</title>
<updated>2024-09-09T16:22:21+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-09-09T15:46:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=92410755ec3bc866774d7353b88a0bc9ed7c9aff'/>
<id>92410755ec3bc866774d7353b88a0bc9ed7c9aff</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>fixed retrieval of characters with codes larger than 127 - always return unsigned integer</title>
<updated>2024-08-27T22:03:04+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-08-27T22:03:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=fdc185b8faaae44d67f85d2c5a9b9fa48d3e2859'/>
<id>fdc185b8faaae44d67f85d2c5a9b9fa48d3e2859</id>
<content type='text'>
* SCI_GETCHARAT is internally casted to `char`, which may be signed.
  Characters &gt; 127 therefore become negative and stay so when casted to sptr_t.
  We therefore cast it back to guchar (unsigned char).
* The same is true whenever returning a string's character to SciTECO (teco_int_t)
  as our string type is `gchar *`.
* &lt;^^x&gt; now also works for those characters.
  Eventually, the parser will probably become UTF8-aware and this will
  have to be done differently.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* SCI_GETCHARAT is internally casted to `char`, which may be signed.
  Characters &gt; 127 therefore become negative and stay so when casted to sptr_t.
  We therefore cast it back to guchar (unsigned char).
* The same is true whenever returning a string's character to SciTECO (teco_int_t)
  as our string type is `gchar *`.
* &lt;^^x&gt; now also works for those characters.
  Eventually, the parser will probably become UTF8-aware and this will
  have to be done differently.
</pre>
</div>
</content>
</entry>
<entry>
<title>fully support out of tree builds</title>
<updated>2024-08-23T02:51:55+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-08-23T02:13:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=ee9cf43587d5fef3a0f6d97ef50b8cf848945bcb'/>
<id>ee9cf43587d5fef3a0f6d97ef50b8cf848945bcb</id>
<content type='text'>
* You no longer have to copy contrib/scintilla, contrib/scinterm and contrib/lexilla
  manually to the build directory.
* It turns out, that Scintilla/Lexilla was supporting this since 2016.
  Scintilla allows pointing to a source directory (srdir) and Lexilla to a binary directory (DIR_O).
* For Scinterm I opened a pull request in order to add srcdir/basedir variables:
  https://github.com/orbitalquark/scinterm/pull/21
* `make distcheck` is therefore now also fixed.
* The FreeBSD package is now allowed to build out of source.
  I haven't tested it yet.
* See also https://github.com/ScintillaOrg/lexilla/issues/266
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* You no longer have to copy contrib/scintilla, contrib/scinterm and contrib/lexilla
  manually to the build directory.
* It turns out, that Scintilla/Lexilla was supporting this since 2016.
  Scintilla allows pointing to a source directory (srdir) and Lexilla to a binary directory (DIR_O).
* For Scinterm I opened a pull request in order to add srcdir/basedir variables:
  https://github.com/orbitalquark/scinterm/pull/21
* `make distcheck` is therefore now also fixed.
* The FreeBSD package is now allowed to build out of source.
  I haven't tested it yet.
* See also https://github.com/ScintillaOrg/lexilla/issues/266
</pre>
</div>
</content>
</entry>
<entry>
<title>fixed expressions like `1,(2)` or `(1),(2)`: they are reported as two numbers now</title>
<updated>2024-02-08T01:45:54+00:00</updated>
<author>
<name>Robin Haberkorn</name>
<email>robin.haberkorn@googlemail.com</email>
</author>
<published>2024-02-07T18:23:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.fmsbw.de/sciteco/commit/?id=83398b3fb5674441ebe73d0f6e1226cb3c700aa9'/>
<id>83398b3fb5674441ebe73d0f6e1226cb3c700aa9</id>
<content type='text'>
* Instead of TECO_OP_NEW, there should perhaps simply be a flag of whether `,` was used.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Instead of TECO_OP_NEW, there should perhaps simply be a flag of whether `,` was used.
</pre>
</div>
</content>
</entry>
</feed>
