From b31b88717172e22b49c0493185f603b8f84989ec Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <robin.haberkorn@googlemail.com>
Date: Wed, 4 Sep 2024 12:49:29 +0200
Subject: the ^EUq string building escape now respects the encoding (can insert
 bytes or codepoints) (refs #5)

* This is trickier than it sounds because there isn't one single place to consult.
  It depends on the context.
  If the string argument relates to buffer contents - as in <I>, <S>, <FR> etc. -
  the buffer's encoding is consulted.
  If it goes into a register (EU), the register's encoding is consulted.
  Everything else (O, EN, EC, ES...) expects only Unicode codepoints.
* This is communicated through a new field teco_machine_stringbuilding_t::codepage
  which must be set in the states' initial callback.
* Seems overkill just for ^EUq, but it can be used for context-sensitive
  processing of all the other string building constructs as well.
* ^V and ^W cannot be supported for Unicode characters for the time being without an Unicode-aware parser
---
 src/spawn.c | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'src/spawn.c')
diff --git a/src/spawn.c b/src/spawn.c
index c1fb426..4317288 100644
--- a/src/spawn.c
+++ b/src/spawn.c
@@ -164,6 +164,11 @@ teco_state_execute_initial(teco_machine_main_t *ctx, GError **error)
 	if (ctx->mode > TECO_MODE_NORMAL)
 		return TRUE;
 
+	/*
+	 * Command-lines and file names are always assumed to be UTF-8.
+	 */
+	teco_undo_guint(ctx->expectstring.machine.codepage) = SC_CP_UTF8;
+
 	if (!teco_expressions_eval(FALSE, error))
 		return FALSE;
 
-- 
cgit v1.2.3