sciteco/doc/sciteco.7.template, branch hsrex

function key macros have been reworked into a more generic key macro feature

2024-09-12T14:44:13+00:00

* ALL keypresses (the UTF-8 sequences resulting from key presses) can now be remapped.
* This is especially useful with Unicode support, as you might want to alias
  international characters to their corresponding latin form in the start state,
  so you don't have to change keyboard layouts so often.
  This is done automatically in Gtk, where we have hardware key press information,
  but has to be done with key macros in Curses.
  There is a new key mask 4 (bit 3) for that purpose now.
* Also, you might want to define non-ANSI letters to perform special functions in
  the start state where it won't be accepted by the parser anyway.
  Suppose you have a macro M→, you could define
  @^U[^K→]{m→} 1^_U[^K→]
  This effectively "extends" the parser and allow you to call macro "→" by a single
  key press. See also #5.
* The register prefix has been changed from ^F (for function) to ^K (for key).
  This is the only thing you have to change in order to migrate existing
  function key macros.
* Key macros are enabled by default. There is no longer any way to disable
  function key handling in curses, as I never found any reason or need to disable it.
  Theoretically, the default ESCDELAY could turn out to be too small and function
  keys don't get through. I doubt that's possible unless on extremely slow serial lines.
  Even then, you'd have to increase ESCDELAY and instead of disabling function keys
  simply define an escape surrogate.
* The ED flag has been removed and its place is reserved for a future mouse support flag
  (which does make sense to disable in curses sometimes).
  fnkeys.tes is consequently also enabled by default in sample.teco_ini.
* Key macros are handled as an unit. If one character results in an error,
  the entire string is rubbed out.
  This fixes the "CLOSE" key on Gtk.
  It also makes sure that the original error message is preserved and not overwritten
  by some subsequent syntax error.
  It was never useful that we kept inserting characters after the first error.

the SciTECO parser is Unicode-based now (refs #5)

2024-09-11T14:14:27+00:00

The following rules apply:
 * All SciTECO macros __must__ be in valid UTF-8, regardless of the
   the register's configured encoding.
   This is checked against before execution, so we can use glib's non-validating
   UTF-8 API afterwards.
 * Things will inevitably get slower as we have to validate all macros first
   and convert to gunichar for each and every character passed into the parser.
   As an optimization, it may make sense to have our own inlineable version of
   g_utf8_get_char() (TODO).
   Also, Unicode glyphs in syntactically significant positions may be case-folded -
   just like ASCII chars were. This is is of course slower than case folding
   ASCII. The impact of this should be measured and perhaps we should restrict
   case folding to a-z via teco_ascii_toupper().
 * The language itself does not use any non-ANSI characters, so you don't have to
   use UTF-8 characters.
 * Wherever the parser expects a single character, it will now accept an arbitrary
   Unicode/UTF-8 glyph as well.
   In other words, you can call macros like M§ instead of having to write M[§].
   You can also get the codepoint of any Unicode character with ^^x.
   Pressing an Unicode character in the start state or in Ex and Fx will now
   give a sane error message.
 * When pressing a key which produces a multi-byte UTF-8 sequence, the character
   gets translated back and forth multiple times:
   1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
      On Curses we could directly get a wide char using wget_wch(), but it's
      not currently used, so we don't depend on widechar curses.
   2. Parsed into gunichar for passing into the edit command callbacks.
      This also validates the codepoint - everything later on can assume valid
      codepoints and valid UTF-8 strings.
   3. Once the edit command handling decides to insert the key into the command line,
      it is serialized back into an UTF-8 string as the command line macro has
      to be in UTF-8 (like all other macros).
   4. The parser reads back gunichars without validation for passing into
      the parser callbacks.
 * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
   inserted and displayed UTF-8 sequences, are now fixed.

added raw ANSI mode to facilitate 8-bit clean editing (refs #5)

2024-09-09T16:22:21+00:00

* When enabled with bit 2 in the ED flags (0,4ED),
  all registers and buffers will get the raw ANSI encoding (as if 0EE had been
  called on them).
  You can still manually change the encoding, eg. by calling 65001EE afterwards.
* Also the ANSI mode sets up character representations for all bytes >= 0x80.
  This is currently done only depending on the ED flag, not when setting 0EE.
* Since setting 16,4ED for 8-bit clean editing in a macro can be tricky -
  the default unnamed buffer will still be at UTF-8 and at least a bunch
  of environment registers as well - we added the command line option
  `--8bit` (short `-8`) which configures the ED flags very early on.
  As another advantage you can mung the profile in 8-bit mode as well
  when using SciTECO as a sort of interactive hex editor.
* Disable UTF-8 checks in 8-bit clean mode (sample.teco_ini).

updated README and sciteco(7) with information about Unicode support (refs #5)

2024-09-09T16:22:21+00:00

the ^EUq string building escape now respects the encoding (can insert bytes or codepoints) (refs #5)

2024-09-09T16:22:21+00:00

* This is trickier than it sounds because there isn't one single place to consult.
  It depends on the context.
  If the string argument relates to buffer contents - as in , ,  etc. -
  the buffer's encoding is consulted.
  If it goes into a register (EU), the register's encoding is consulted.
  Everything else (O, EN, EC, ES...) expects only Unicode codepoints.
* This is communicated through a new field teco_machine_stringbuilding_t::codepage
  which must be set in the states' initial callback.
* Seems overkill just for ^EUq, but it can be used for context-sensitive
  processing of all the other string building constructs as well.
* ^V and ^W cannot be supported for Unicode characters for the time being without an Unicode-aware parser

conditionals now check for Unicode codepoints (refs #5)

2024-09-09T16:22:21+00:00

* This will naturally work with both ASCII characters and various non-English scripts. * Unfortunately, it cannot work with the other non-ANSI single-byte codepages. * If we'd like to support scripts working with all sorts of codepoints, we'd have to introduce a new command for translating individual codepoints from the current codepage (as reported by EE) to Unicode.

avoid Groff warnings due to `\` escapes

2024-02-05T22:52:45+00:00

* It's generally a bad idea to pass backslashes as a glyph in macro arguments, even as `\\` since this could easily be interpreted as an escape. * Instead we now always use `\[rs]`.

removed nonsensical line from sciteco(7) man page

2024-01-20T02:14:46+00:00

* was introduced in e7867fb0

the SciTECO data installation path is now configurable via --with-scitecodatadir

2023-06-19T17:45:16+00:00

* This is also the base of $SCITECOPATH. * Changing it is useful for packaging where it is not possible to factor out the common files between Curses and Gtk builds into a "sciteco-common" package. As an alternative, you can now create disjunct sciteco-curses and sciteco-gtk packages. * You will most likely want to use this for Gtk builds as in: --with-interface=gtk --program-prefix=g --with-scitecodatadir=/usr/local/share/gsciteco.

Troff documents: fixed monospaced example blocks

2023-04-05T14:52:25+00:00

* .SCITECO_TT should be before .EX, so that the indent is already monospaced. .SCITECO_TT_END still needs to be before .EE however, so that the next non-monospaced line is not "typeset" with a monospaced indent. * naturally only affects the Gtk UI