fixed serious bug with certain alternative string termination chars in commands with multiple string arguments

* When `@`-modifying a command with several string arguments and choosing `{` as the alternative string termination character, the parser would get totally confused. Any sequence of `{` would be ignored and only the first non-`{` would become the termination character. Consequently you also couldn't choose a new terminator after the closing `}`. So even a documented code example from sciteco(7) wouldn't work. The same was true when using $ (escape) or ^A as the alternative termination character. * We can now correctly parse e.g. `@FR{foo}{bar}` or `@FR$foo$bar$` (even though the latter one is quite pointless). * has probably been broken forever (has been broken even before v2.0). * Whitespace is now ignored in front of alternative termination characters as in TECO-64, so we can also write `@S /foo/` or even ``` @^Um { !* blabla *! } ``` I wanted to disallow whitespace termination characters, so the alternative would have been to throw an error. The new implementation at least adds some functionality. * Avoid redundancies when parsing no-op characters via teco_is_noop(). I assume that this is inlined and drawn into any jump-table what would be generated for the switch-statement in teco_state_start_input(). * Alternative termination characters are still case-folded, even if they are Unicode glyphs, so `@IЖfooж` would work and insert `foo`. This should perhaps be restricted to ANSI characters?
author: Robin Haberkorn <robin.haberkorn@googlemail.com> 2025-08-02 13:16:16 +0300
committer: Robin Haberkorn <robin.haberkorn@googlemail.com> 2025-08-02 13:16:16 +0300
commit: e46352bc614cf9777ca76deb47330fb408bc1a23 (patch)
tree: 2e900970b9eebbeb9bab12bef451a51a7f09ed13
parent: 963cd2db9b266f7521374adacb664ca8ec43d36b (diff)
8 files changed, 70 insertions, 40 deletions
diff --git a/doc/sciteco.7.template b/doc/sciteco.7.template
index 3bf8d2e..bad7e7d 100644
--- a/doc/sciteco.7.template
+++ b/doc/sciteco.7.template
@@ -1394,13 +1394,16 @@ return 0 instead.
 .LP
 .SCITECO_TOPIC ::
 Two colons (\fB::\fP) can sometimes further modify a command's behavior \(em
-currently it is used by the \fB::S\fP search comparison command
-and a few related search-and-replace operations.
+currently it is used by the timestamp command \fB::^H\fP,
+the \fB::S\fP search comparison command and a few related search-and-replace operations.
 .LP
 .SCITECO_TOPIC @ at
 When put in front of a command with string arguments,
 the at (\fB@\fP) modifier always allows the string termination character
 to be changed for that particular command.
+\# This particular bit of syntax is TECO-64 inspired.
+Whitespace characters (as by \*(ST's understanding of no-op characters)
+are ignored immediately after the command name if it was \fB@\fP-modified.
 The alternative termination character must be specified just before
 the first string argument.
 For instance:
@@ -1409,13 +1412,13 @@ For instance:
 @FS/foo/bar/
 .SCITECO_TT_END
 .EE
-Any character may be used as an alternative termination character.
+Any non-whitespace character may be used as an alternative termination character
+and is matched case-insensitively.
 There is one special case, though.
 If specified as the opening curly brace (\fB{\fP), a string argument
 will continue until the closing curly brace (\fB}\fP).
 Curly braces must be balanced and after the closing curly brace
-the termination character is reset to Escape and another one may
-be chosen.
+a new termination character may be chosen after optional whitespace.
 This feature is especially useful for embedding TECO code in
 string arguments, as in:
 .SCITECO_TT
@@ -1425,6 +1428,16 @@ string arguments, as in:
 }
 .SCITECO_TT_END
 .EE
+Since whitespace is ignored in front of the alternative escape character,
+this could have also been written as:
+.SCITECO_TT
+.EX
+@^Um
+{
+  @FS {foo} /bar/
+}
+.SCITECO_TT_END
+.EE
 The termination character can be \fIquoted\fP if you want to handle
 it like any regular character.
 For instance, you could write \(lqS^Q\fB$$\fP\(rq to search for the
@@ -2158,7 +2171,7 @@ the program counter and influencing parsing, it is described in this
 document's command reference.
 \*(ST can perform simple unconditional and computed gotos.
 .LP
-.SCITECO_TOPIC label
+.SCITECO_TOPIC "!" label
 Labels are symbolic and are defined with the following syntax:
 .br
 .BI ! label !
@@ -2187,7 +2200,7 @@ In addition to labels and unlike most classic TECO dialects,
 \*(ST also supports true comments.
 True comments are parsed faster than labels and do not
 take up memory in goto tables.
-.SCITECO_TOPIC "block comment"
+.SCITECO_TOPIC "!*" "block comment"
 One form of comments is the block comment:
 .br
 .BI !* comment *!
@@ -2199,7 +2212,8 @@ They are analoguous to C's
 .BI /* ... */
 comments.
 .LP
-.SCITECO_TOPIC "EOL comment"
+\# This form of comment was originally in TECO-64.
+.SCITECO_TOPIC "!!" "EOL comment"
 The second form of real comments are end-of-line comments,
 which are analogous to C++'s \fB//\fP comments:
 .br
diff --git a/src/cmdline.c b/src/cmdline.c
index 1f12c7b..089bd7a 100644
--- a/src/cmdline.c
+++ b/src/cmdline.c
@@ -531,7 +531,7 @@ teco_state_command_process_edit_cmd(teco_machine_main_t *ctx, teco_machine_t *pa
 
 			while (ctx->parent.current->is_start &&
 			       teco_cmdline.effective_len < teco_cmdline.str.len &&
-			       strchr(TECO_NOOPS, teco_cmdline.str.data[teco_cmdline.effective_len]))
+			       teco_is_noop(teco_cmdline.str.data[teco_cmdline.effective_len]))
 				if (!teco_cmdline_rubin(error))
 					return FALSE;
 
@@ -541,7 +541,7 @@ teco_state_command_process_edit_cmd(teco_machine_main_t *ctx, teco_machine_t *pa
 		/* rubout command */
 		while (ctx->parent.current->is_start &&
 		       teco_cmdline.effective_len > 0 &&
-		       strchr(TECO_NOOPS, teco_cmdline.str.data[teco_cmdline.effective_len-1]))
+		       teco_is_noop(teco_cmdline.str.data[teco_cmdline.effective_len-1]))
 			teco_cmdline_rubout();
 
 		do
diff --git a/src/core-commands.c b/src/core-commands.c
index c71ee95..f384272 100644
--- a/src/core-commands.c
+++ b/src/core-commands.c
@@ -722,24 +722,21 @@ teco_state_start_input(teco_machine_main_t *ctx, gunichar chr, GError **error)
 		['T']  = {&teco_state_start, teco_state_start_typeout}
 	};
 
-	switch (chr) {
 	/*
-	 * No-ops (same as TECO_NOOPS):
+	 * Non-operational commands.
 	 * These are explicitly not handled in teco_state_control,
 	 * so that we can potentially reuse the upcaret notations like ^J.
 	 */
-	case ' ':
-	case '\f':
-	case '\r':
-	case '\n':
-	case '\v':
+	if (teco_is_noop(chr)) {
 		if (ctx->flags.modifier_at ||
 		    (ctx->flags.mode == TECO_MODE_NORMAL && ctx->flags.modifier_colon)) {
 			teco_error_modifier_set(error, chr);
 			return NULL;
 		}
 		return &teco_state_start;
+	}
 
+	switch (chr) {
 	/*$ 0 1 2 3 4 5 6 7 8 9 digit number
 	 * [n]0|1|2|3|4|5|6|7|8|9 -> n*Radix+X -- Append digit
 	 *
diff --git a/src/core-commands.h b/src/core-commands.h
index bf73b8c..cb28dce 100644
--- a/src/core-commands.h
+++ b/src/core-commands.h
@@ -22,8 +22,12 @@
 #include "parser.h"
 #include "string-utils.h"
 
-/** non-operational characters in teco_state_start */
-#define TECO_NOOPS " \f\r\n\v"
+/** Check whether c is a non-operational command in teco_state_start */
+static inline gboolean
+teco_is_noop(gunichar c)
+{
+	return c == ' ' || c == '\f' || c == '\r' || c == '\n' || c == '\v';
+}
 
 gboolean teco_get_range_args(const gchar *cmd, gsize *from_ret, gsize *len_ret, GError **error);
 
diff --git a/src/goto-commands.c b/src/goto-commands.c
index 97c58d0..d95886d 100644
--- a/src/goto-commands.c
+++ b/src/goto-commands.c
@@ -218,6 +218,9 @@ teco_state_blockcomment_input(teco_machine_main_t *ctx, gunichar chr, GError **e
 
 TECO_DEFINE_STATE_COMMENT(teco_state_blockcomment);
 
+/*
+ * `!!` line comments are inspired by TECO-64.
+ */
 static teco_state_t *
 teco_state_eolcomment_input(teco_machine_main_t *ctx, gunichar chr, GError **error)
 {
diff --git a/src/parser.c b/src/parser.c
index 347c1a6..6d4cd60 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -996,6 +996,11 @@ teco_machine_stringbuilding_escape(teco_machine_stringbuilding_t *ctx, const gch
 	for (guint i = 0; i < len; ) {
 		gunichar chr = g_utf8_get_char(str+i);
 
+		/*
+		 * NOTE: We support both `[` and `{`, so this works for autocompleting
+		 * long Q-register specifications as well.
+		 * This may therefore insert unnecessary ^Q, but they won't hurt.
+		 */
 		if (g_unichar_toupper(chr) == ctx->escape_char ||
 		    (ctx->escape_char == '[' && chr == ']') ||
 		    (ctx->escape_char == '{' && chr == '}'))
@@ -1032,34 +1037,28 @@ teco_state_expectstring_input(teco_machine_main_t *ctx, gunichar chr, GError **e
 	teco_state_t *current = ctx->parent.current;
 
 	/*
-	 * String termination handling
+	 * Ignore whitespace immediately after @-modified commands.
+	 * This is inspired by TECO-64.
+	 * The alternative would have been to throw an error,
+	 * as allowing whitespace escape_chars is harmful.
 	 */
-	if (ctx->flags.modifier_at) {
-		if (current->expectstring.last)
-			/* also clears the "@" modifier flag */
-			teco_machine_main_eval_at(ctx);
+	if (ctx->flags.modifier_at && teco_is_noop(chr))
+		return current;
 
+	/*
+	 * String termination handling
+	 */
+	if (teco_machine_main_eval_at(ctx)) {
 		/*
-		 * FIXME: Exclude setting at least whitespace characters as the
-		 * new string escape character to avoid accidental errors?
-		 *
 		 * FIXME: Should we perhaps restrict case folding escape characters
 		 * to the ANSI range (teco_ascii_toupper())?
-		 * This would be faster than case folding each and every character
+		 * This would be faster than case folding almost all characters
 		 * of a string argument to check against the escape char.
-		 *
-		 * FIXME: This has undesired effects if you try to use one of
-		 * of these characters with multiple string arguments.
 		 */
-		switch (ctx->expectstring.machine.escape_char) {
-		case TECO_CTL_KEY('A'):
-		case '\e':
-		case '{':
-			if (ctx->parent.must_undo)
-				teco_undo_gunichar(ctx->expectstring.machine.escape_char);
-			ctx->expectstring.machine.escape_char = g_unichar_toupper(chr);
-			return current;
-		}
+		if (ctx->parent.must_undo)
+			teco_undo_gunichar(ctx->expectstring.machine.escape_char);
+		ctx->expectstring.machine.escape_char = g_unichar_toupper(chr);
+		return current;
 	}
 
 	/*
@@ -1113,6 +1112,14 @@ teco_state_expectstring_input(teco_machine_main_t *ctx, gunichar chr, GError **e
 			if (ctx->parent.must_undo)
 				teco_undo_gunichar(ctx->expectstring.machine.escape_char);
 			ctx->expectstring.machine.escape_char = '\e';
+		} else if (ctx->expectstring.machine.escape_char == '{') {
+			/*
+			 * Makes sure that after all but the last string argument,
+			 * the escape character is reset, as in @FR{foo}{bar}.
+			 */
+			if (ctx->parent.must_undo)
+				teco_undo_flags(ctx->flags);
+			ctx->flags.modifier_at = TRUE;
 		}
 		ctx->expectstring.nesting = 1;
 
diff --git a/src/parser.h b/src/parser.h
index a1583d2..095f523 100644
--- a/src/parser.h
+++ b/src/parser.h
@@ -75,7 +75,9 @@ void undo__remove_index__teco_loop_stack(guint);
  * FIXME: Maybe use TECO_DECLARE_VTABLE_METHOD()?
  */
 typedef const struct {
+	/** whether string building characters are enabled by default */
 	guint string_building : 1;
+	/** whether this string argument is the last of the command */
 	guint last : 1;
 
 	/**
diff --git a/tests/testsuite.at b/tests/testsuite.at
index 428757c..15fb810 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -118,6 +118,9 @@ AT_SETUP([String arguments])
 TE_CHECK([[Ifoo^Q]]TE_ESCAPE[[(0/0)]]TE_ESCAPE, 0, ignore, ignore)
 TE_CHECK([[@I"foo^Q"(0/0)"]], 0, ignore, ignore)
 TE_CHECK([[@I{foo{bar}foo^Q{(0/0)}]], 0, ignore, ignore)
+TE_CHECK([[@^Ua
+           {12345} :Qa-5"N(0/0)']], 0, ignore, ignore)
+TE_CHECK([[@I/X/ H@FR{X}/12345/ Z-5"N(0/0)']], 0, ignore, ignore)
 TE_CHECK([[@Ia^EQa(0/0)a]], 0, ignore, ignore)
 # Video-TECO-like syntax - might change in the future
 TE_CHECK([[@I/^E<65>^E<0x41>^E<0101>/ <-A:; -A-^^A"N(0/0)' R>]], 0, ignore, ignore)
author	Robin Haberkorn <robin.haberkorn@googlemail.com>	2025-08-02 13:16:16 +0300
committer	Robin Haberkorn <robin.haberkorn@googlemail.com>	2025-08-02 13:16:16 +0300
commit	e46352bc614cf9777ca76deb47330fb408bc1a23 (patch)
tree	2e900970b9eebbeb9bab12bef451a51a7f09ed13
parent	963cd2db9b266f7521374adacb664ca8ec43d36b (diff)