From b31b88717172e22b49c0493185f603b8f84989ec Mon Sep 17 00:00:00 2001 From: Robin Haberkorn Date: Wed, 4 Sep 2024 12:49:29 +0200 Subject: the ^EUq string building escape now respects the encoding (can insert bytes or codepoints) (refs #5) * This is trickier than it sounds because there isn't one single place to consult. It depends on the context. If the string argument relates to buffer contents - as in , , etc. - the buffer's encoding is consulted. If it goes into a register (EU), the register's encoding is consulted. Everything else (O, EN, EC, ES...) expects only Unicode codepoints. * This is communicated through a new field teco_machine_stringbuilding_t::codepage which must be set in the states' initial callback. * Seems overkill just for ^EUq, but it can be used for context-sensitive processing of all the other string building constructs as well. * ^V and ^W cannot be supported for Unicode characters for the time being without an Unicode-aware parser --- doc/sciteco.7.template | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'doc/sciteco.7.template') diff --git a/doc/sciteco.7.template b/doc/sciteco.7.template index a6cca40..ca23c93 100644 --- a/doc/sciteco.7.template +++ b/doc/sciteco.7.template @@ -1647,6 +1647,12 @@ Expands to the character whose code is stored in the numeric part of Q-Register \fIq\fP. For instance if register \(lqA\(rq contains the code 66, \(lq^EUa\(rq expands to the character \(lqB\(rq. +The interpretation of this code depends on the context. +Within inserts and searches (\fBI\fP, \fBS\fP, etc.) bytes or Unicode codepoints +are expected depending on the buffer's encoding. +Operations on registers (\fBEU\fP) similarily consult the +register's encoding. +Everything else expects Unicode codepoints. .TP .SCITECO_TOPIC ^EQ ^EQq .BI ^EQ q -- cgit v1.2.3