terex is the new regular expression engine now and replaces PCRE (GRegex)

* terex is based on Henry Spencer's regular expression engine for Tcl. It is a hybrid NFA/DFA design which has better worst-time runtimes than the backtracking PCRE. Memory usage is also limited and can no longer increase catastrophically. * It should no longer be possible to crash SciTECO with pathological searches. * Since it reliably supports partial matches (REG_EXPECT) we can now enable the new backwards-search algorithm by default. This used to be broken because of a glib bug, which I already fixed. It would however take a long time until this ends up on the majority of glib installations. * Regexp executions can still be quite slow if you are looking for a pattern at the end of a huge file, which can hang the editor, but this can now at least theoretically be solved by adding hooks into terex to poll for interruptions. * We can now also get rid of a TECO-pattern to regexp translation step by directly generating terex tokens (TODO). * Performance-wise terex appears to be slower than PCRE for simple forward searches even when linking everything with optimzations (FIXME). * Having a stand-alone regular expression engine is also a huge step in getting rid of glib. See also: https://git.fmsbw.de/terex/about/
author: Robin Haberkorn <rhaberkorn@fmsbw.de> 2026-06-28 00:39:51 +0200
committer: Robin Haberkorn <rhaberkorn@fmsbw.de> 2026-06-28 00:39:51 +0200
commit: 4fe5bc6f3867096965270c90f2e1e5df77b8825f (patch)
tree: 07823673c598cf4289ea0ae769c32924e1fcce10
parent: c5cb45fab6d4a63a4fcff2cf7f6801dae2ac4db2 (diff)
11 files changed, 187 insertions, 175 deletions
diff --git a/.gitmodules b/.gitmodules
index af9fd68..d825212 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -8,3 +8,6 @@
 [submodule "lexilla"]
 	path = contrib/lexilla
 	url = https://github.com/ScintillaOrg/lexilla.git
+[submodule "terex"]
+	path = contrib/terex
+	url = git://git.fmsbw.de/terex
diff --git a/Makefile.am b/Makefile.am
index 8284bc1..e956878 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -5,7 +5,8 @@ ACLOCAL_AMFLAGS = -I m4
 if REPLACE_MALLOC
 MAYBE_DLMALLOC = contrib/dlmalloc
 endif
-SUBDIRS = lib $(MAYBE_DLMALLOC) contrib/rb3ptr src doc tests
+SUBDIRS = lib $(MAYBE_DLMALLOC) contrib/rb3ptr contrib/terex \
+          src doc tests
 
 dist_scitecodata_DATA = fallback.teco_ini
 
diff --git a/TODO b/TODO
index 6588fe7..b2de61c 100644
--- a/TODO
+++ b/TODO
@@ -74,33 +74,6 @@ Known Bugs:
    and b) the file mode and ownership of re-created files can be preserved.
    We should fall back silently to an (inefficient) memory copy or temporary
    file strategy if this is detected.
- * All backward searches from the end of excessively large files can be very
-   slow, especially in UTF mode, since you are always producing
-   all matches over the entire document.
-   Perhaps scan in 4kb blocks from dot upwards, but with partial matches.
-   When getting partial matches, the match falls on a block boundary and
-   we can extended the scanned area downwards until dot.
-   This currently doesn't work with glib's regexp (PCRE) since
-   g_match_info_fetch_pos() handles partial matches like errors.
-   Here's an upstream merge request to fix that:
-   https://gitlab.gnome.org/GNOME/glib/-/merge_requests/5199
- * Crashes on large files: S^EM^X$ (regexp: (?:.)+)
-   Happens because the Glib regex engine is based on a recursive (backtracking)
-   Perl regex library and glib doesn't expose pcre_extra.
-   We could include `(*LIMIT_RECURSION=d)` in the pattern, though.
-   I can provoke the problem only on Ubuntu 20.04.
-   We can try g_regex_match_all_full() which will use a DFA, but
-   it doesn't capture subexpressions.
-   We need something based on a non-backtracking Thompson's NFA with Unicode (UTF-8), see
-   https://swtch.com/~rsc/regexp/
-   Basically only RE2 would check all the boxes.
-   RE2 doesn't have a native C API, so we would also have to import the
-   https://github.com/marcomaggi/cre2/ wrapper.
-   re2 should be an optional dependency, so we can still build against the
-   glib APIs.
-   Optionally, I could build a PCRE-compatible wrapper for Rust's regex crate.
-   It would also be possible to port one of Henry Spencer's engines (hxrex or its
-   PosgreSQL derivation or the version from Vim) to UTF-8 and add it as a submodule.
  * It is still possible to hang searches on huge files since a single match
    could still scan too much memory - e.g. try searching for a word that
    occurs only at the end of the huge file.
diff --git a/configure.ac b/configure.ac
index 43b4d1b..09d0804 100644
--- a/configure.ac
+++ b/configure.ac
@@ -545,6 +545,7 @@ AC_CONFIG_FILES([GNUmakefile:Makefile.in src/GNUmakefile:src/Makefile.in]
                 [src/interface-curses/GNUmakefile:src/interface-curses/Makefile.in]
                 [contrib/dlmalloc/GNUmakefile:contrib/dlmalloc/Makefile.in]
                 [contrib/rb3ptr/GNUmakefile:contrib/rb3ptr/Makefile.in]
+                [contrib/terex/GNUmakefile:contrib/terex/Makefile.in]
                 [lib/GNUmakefile:lib/Makefile.in]
                 [doc/GNUmakefile:doc/Makefile.in doc/Doxyfile]
                 [tests/GNUmakefile:tests/Makefile.in tests/atlocal])
diff --git a/contrib/terex b/contrib/terex
new file mode 160000
+Subproject fa3d463a4cd563f3c5f29331f48a0161bf58686
diff --git a/debian/copyright b/debian/copyright
index b8908ae..442c3a3 100644
--- a/debian/copyright
+++ b/debian/copyright
@@ -33,6 +33,35 @@ License: MIT
  IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
  CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
+Files: contrib/terex/*.c contrib/terex/*.h
+Copyright: Copyright (c) 1998, 1999 Henry Spencer.  All rights reserved.
+License:
+ Copyright (c) 1998, 1999 Henry Spencer.  All rights reserved.
+ .
+ Development of this software was funded, in part, by Cray Research Inc.,
+ UUNET Communications Services Inc., Sun Microsystems Inc., and Scriptics
+ Corporation, none of whom are responsible for the results.  The author
+ thanks all of them.
+ .
+ Redistribution and use in source and binary forms -- with or without
+ modification -- are permitted for any purpose, provided that
+ redistributions in source form retain this entire copyright notice and
+ indicate the origin and nature of any modifications.
+ .
+ I'd appreciate being given credit for this package in the documentation of
+ software which uses it, but that is not a requirement.
+ .
+ THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+ AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
+ HENRY SPENCER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
 Files: contrib/scintilla/* contrib/lexilla/*
 Copyright: Copyright 1998-2021 Neil Hodgson <neilh@scintilla.org>
 License: MIT-Hodgson
diff --git a/src/Makefile.am b/src/Makefile.am
index ff2e86b..8ac58c7 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -13,7 +13,7 @@ include $(top_srcdir)/contrib/scintilla.am
 
 # FIXME: Common flags should be in configure.ac
 AM_CFLAGS = -std=gnu11 -Wall -Wno-initializer-overrides -Wno-unused-value
-AM_CPPFLAGS += -I$(top_srcdir)/contrib/rb3ptr
+AM_CPPFLAGS += -I$(top_srcdir)/contrib/rb3ptr -I$(top_srcdir)/contrib/terex
 AM_LDFLAGS =
 
 if STATIC_EXECUTABLES
@@ -60,7 +60,8 @@ libsciteco_base_la_SOURCES = main.c sciteco.h list.h \
 # NOTE: We cannot link in Scintilla (static library) into
 # a libtool convenience library
 libsciteco_base_la_LIBADD = $(LIBSCITECO_INTERFACE) \
-                            $(top_builddir)/contrib/rb3ptr/librb3ptr.la
+                            $(top_builddir)/contrib/rb3ptr/librb3ptr.la \
+                            $(top_builddir)/contrib/terex/libterex.la
 if REPLACE_MALLOC
 libsciteco_base_la_LIBADD += $(top_builddir)/contrib/dlmalloc/libdlmalloc.la
 endif
diff --git a/src/core-commands.c b/src/core-commands.c
index 5ca508c..81d5869 100644
--- a/src/core-commands.c
+++ b/src/core-commands.c
@@ -2224,8 +2224,7 @@ teco_state_ecommand_flags(teco_machine_main_t *ctx, GError **error)
  * Only this setting guarantees leftmost longest matches that
  * are entirely symmetric to forward searches, but can be
  * unpractically slow on huge files.
- * The default is 0.
- * \# FIXME: Feature is currently broken!
+ * The default is 4kb.
  * .
  * .IP -1:
  * Type of the last mouse event (\fBread-only\fP).
diff --git a/src/error.h b/src/error.h
index 67de4aa..3d4334f 100644
--- a/src/error.h
+++ b/src/error.h
@@ -57,6 +57,7 @@ typedef enum {
 	TECO_ERROR_CLIPBOARD,
 	TECO_ERROR_WIN32,
 	TECO_ERROR_MODULE,
+	TECO_ERROR_REGEX,
 
 	/** Interrupt current operation */
 	TECO_ERROR_INTERRUPTED,
diff --git a/src/search.c b/src/search.c
index 90975c2..601cc55 100644
--- a/src/search.c
+++ b/src/search.c
@@ -25,6 +25,10 @@
 #include <glib.h>
 #include <glib/gprintf.h>
 
+/* should always be from contrib/terex */
+#include <regalone.h>
+#include <regex.h>
+
 #include "sciteco.h"
 #include "string-utils.h"
 #include "expressions.h"
@@ -57,6 +61,18 @@ TECO_DEFINE_UNDO_SCALAR(teco_search_parameters_t);
  */
 static teco_search_parameters_t teco_search_parameters;
 
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(regex_t, tere_free);
+
+/* not in error.h since we don't want to draw in the terex headers */
+static inline void
+teco_error_regex_set(GError **error, gint rc, const regex_t *re)
+{
+	gchar buf[1024];
+	tere_error(rc, re, buf, sizeof(buf));
+	g_set_error(error, TECO_ERROR, TECO_ERROR_REGEX,
+	            "Error executing regular expression: %s", buf);
+}
+
 /*$ "^X" "search mode"
  * mode^X -- Set or get search mode flag
  * -^X
@@ -551,24 +567,21 @@ TECO_DEFINE_UNDO_OBJECT_OWN(ranges, teco_range_t *, g_free);
 /**
  * Extract the ranges of the given GMatchInfo.
  *
- * @param match_info The result of g_regex_match().
+ * @param match_info The result of re_exec().
+ * @param count Number of matches (subpatterns).
  * @param offset The beginning of the match operation in bytes.
  *   Match results will be relative to this offset.
- * @param count Where to store the number of ranges (subpatterns).
  * @returns Ranges (subpatterns) in absolute byte positions.
  *   They \b must still be converted to glyph positions afterwards.
  */
 static teco_range_t *
-teco_get_ranges(const GMatchInfo *match_info, gsize offset, guint *count)
+teco_get_ranges(const regmatch_t *match_info, guint count, gsize offset)
 {
-	*count = g_match_info_get_match_count(match_info);
-	teco_range_t *ranges = g_new(teco_range_t, *count);
-
-	for (gint i = 0; i < *count; i++) {
-		gint from, to;
-		g_match_info_fetch_pos(match_info, i, &from, &to);
-		ranges[i].from = offset+MAX(from, 0);
-		ranges[i].to = offset+MAX(to, 0);
+	teco_range_t *ranges = g_new(teco_range_t, count);
+
+	for (gint i = 0; i < count; i++) {
+		ranges[i].from = offset+match_info[i].rm_so;
+		ranges[i].to = offset+match_info[i].rm_eo;
 	}
 
 	return ranges;
@@ -613,32 +626,26 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(teco_matches_t, teco_matches_free);
  * @return FALSE if an error occurred
  */
 static gboolean
-teco_do_search_forward(GRegex *re, gsize from, gsize to, gint *count, GError **error)
+teco_do_search_forward(regex_t *re, gsize from, gsize to, gint *count, GError **error)
 {
-	g_autoptr(GMatchInfo) info = NULL;
 	/* NOTE: can return NULL pointer for completely new and empty documents */
 	const gchar *buffer = (const gchar *)teco_interface_ssm(SCI_GETRANGEPOINTER, from, to-from) ? : "";
-	GError *tmp_error = NULL;
+
+	g_assert(*count > 0);
 
 	/*
-	 * NOTE: The return boolean does NOT signal whether an error was generated.
+	 * FIXME: Repeated allocation could be avoided when scanning over buffer boundaries.
+	 * If it's worth it...
 	 */
-	g_regex_match_full(re, buffer, to-from, 0, 0, &info, &tmp_error);
-	if (tmp_error) {
-		g_propagate_error(error, tmp_error);
-		return FALSE;
-	}
+	g_autofree regmatch_t *info = g_new(regmatch_t, 1+re->re_nsub);
 
-	g_assert(*count > 0);
-	while (g_match_info_matches(info) && --(*count)) {
-		/*
-		 * NOTE: The return boolean does NOT signal whether an error was generated.
-		 */
-		g_match_info_next(info, &tmp_error);
-		if (tmp_error) {
-			g_propagate_error(error, tmp_error);
-			return FALSE;
-		}
+	static const gint eflags = REG_NOTEOL | REG_NOTBOL;
+
+	gint rc;
+	while ((rc = tere_exec(re, (const chr *)buffer, to-from, NULL,
+	                       1+re->re_nsub, info, eflags)) == REG_OKAY && --(*count)) {
+		buffer += info[0].rm_eo;
+		from += info[0].rm_eo;
 
 		/*
 		 * FIXME: A single pathological match could already be excessively slow.
@@ -649,22 +656,21 @@ teco_do_search_forward(GRegex *re, gsize from, gsize to, gint *count, GError **e
 		}
 	}
 
-	if (!*count) {
+	if (rc == REG_OKAY) {
 		/* successful */
-		teco_undo_guint(teco_ranges_count);
-		teco_undo_ranges_own(teco_ranges) = teco_get_ranges(info, from, &teco_ranges_count);
+		g_assert(*count == 0);
+		teco_undo_guint(teco_ranges_count) = 1+re->re_nsub;
+		teco_undo_ranges_own(teco_ranges) = teco_get_ranges(info, teco_ranges_count, from);
+	} else if (rc != REG_NOMATCH) {
+		teco_error_regex_set(error, rc, re);
+		return FALSE;
 	}
 
 	return TRUE;
 }
 
-/**
- * Block size for backwards scanning or 0
- *
- * @bug Block-wise matching is currently broken,
- * so we disable this by default - see below.
- */
-gsize teco_search_block_size = 0; //4*1024;
+/** block size for backwards scanning or 0 */
+gsize teco_search_block_size = 4*1024;
 
 /**
  * Search backwards, in blocks of teco_search_block_size
@@ -681,7 +687,7 @@ gsize teco_search_block_size = 0; //4*1024;
  * @see teco_do_search_forward
  */
 static gboolean
-teco_do_search_backwards(GRegex *re, gsize from, gsize to, gint *count, GError **error)
+teco_do_search_backwards(regex_t *re, gsize from, gsize to, gint *count, GError **error)
 {
 	/*
 	 * NOTE: can return NULL pointer for completely new and empty documents.
@@ -692,6 +698,7 @@ teco_do_search_backwards(GRegex *re, gsize from, gsize to, gint *count, GError *
 	const gchar *buffer = (const gchar *)teco_interface_ssm(SCI_GETRANGEPOINTER, from, to-from) ? : "";
 
 	g_assert(*count < 0);
+
 	guint matched_num = -*count;
 	gsize total_size = sizeof(teco_matches_t) + sizeof(teco_match_t[matched_num]);
 
@@ -713,98 +720,83 @@ teco_do_search_backwards(GRegex *re, gsize from, gsize to, gint *count, GError *
 	if (!teco_memory_check(total_size, error))
 		return FALSE;
 
+	/*
+	 * FIXME: The `matched` and `info` allocations are repeated when scanning
+	 * over buffer boundaries and could be avoided by sharing them between
+	 * teco_do_search() calls. If it's worth it...
+	 */
 	g_autoptr(teco_matches_t) matched = g_malloc0(total_size);
 	matched->count = matched_num;
 
+	g_autofree regmatch_t *info = g_new(regmatch_t, 1+re->re_nsub);
+
 	gint matched_total = 0;
 	gint i = 0; /* ring buffer pointer into the `matched->matches` array */
 
 	gsize to_block = to-from;
 
 	while (to_block > 0) {
-		g_autoptr(GMatchInfo) info = NULL;
-
 		gsize from_block = teco_search_block_size > 0
 					? MAX(0, to_block - teco_search_block_size) : 0;
-		/*
-		 * FIXME: DEC TECO search semantics could actually demand
-		 * allowing matches to extend beyond the [from,to] range.
-		 */
-		GRegexMatchFlags flags = to_block != to-from ? G_REGEX_MATCH_PARTIAL_HARD : 0;
+		/* how many bytes have been consumed in the current block */
+		gsize offset = 0;
 
-		GError *tmp_error = NULL;
+		static const gint eflags = REG_NOTEOL | REG_NOTBOL;
+		/* for partial matches - mandatory when using REG_EXPECT */
+		rm_detail_t details;
 
-		/*
-		 * NOTE: The return boolean does NOT signal whether an error was generated.
-		 * FIXME: Why isn't it possible to specify a start_position != 0?
-		 */
-		g_regex_match_full(re, buffer+from_block, to_block-from_block, 0,
-		                   flags, &info, &tmp_error);
-		if (tmp_error) {
-			g_propagate_error(error, tmp_error);
-			return FALSE;
-		}
+		gint rc;
 
 		for (;;) {
+			/*
+			 * FIXME: A single pathological match could already be excessively slow.
+			 */
 			if (G_UNLIKELY(teco_interface_is_interrupted())) {
 				teco_error_interrupted_set(error);
 				return FALSE;
 			}
 
-			if (g_match_info_matches(info)) {
-				g_free(matched->matches[i].ranges);
-				matched->matches[i].ranges = teco_get_ranges(info, from+from_block,
-				                                             &matched->matches[i].num_ranges);
-				i = ++matched_total % matched_num;
-			} else if (G_UNLIKELY(g_match_info_is_partial_match(info))) {
-				/*
-				 * Match may fall on the block boundary,
-				 * so retry matching the rest of the document.
-				 * This is the only case where we have to rescan
-				 * the same memory more than once.
-				 *
-				 * FIXME FIXME FIXME: We cannot retrieve the position here
-				 * since g_match_info_fetch_pos() treats partial matches as errors.
-				 * This is a confirmed glib bug and fast backwards searches
-				 * will continue to be broken until we switch to a custom regexp
-				 * engine.
-				 */
-				gint partial_start, partial_end;
-				G_GNUC_UNUSED gboolean rc;
-				rc = g_match_info_fetch_pos(info, 0, &partial_start, &partial_end);
-				//g_assert(rc == TRUE);
-				if (!rc)
-					/* make sure that test case fails */
-					abort();
-				g_assert(partial_end == to_block-from_block);
-
-				g_autoptr(GMatchInfo) partial_info = NULL;
-
-				g_regex_match_full(re, buffer+partial_start, to-from-partial_start, 0,
-				                   G_REGEX_MATCH_ANCHORED, &partial_info, &tmp_error);
-				if (tmp_error) {
-					g_propagate_error(error, tmp_error);
-					return FALSE;
-				}
+			rc = tere_exec(re, (const chr *)buffer+from_block+offset, to_block-from_block-offset,
+			               &details, 1+re->re_nsub, info, eflags);
+			if (rc != REG_OKAY)
+				break;
 
-				if (g_match_info_matches(partial_info)) {
-					g_free(matched->matches[i].ranges);
-					matched->matches[i].ranges = teco_get_ranges(partial_info, from+partial_start,
-					                                             &matched->matches[i].num_ranges);
-					i = ++matched_total % matched_num;
-				}
+			/* normal full match */
+			g_free(matched->matches[i].ranges);
+			matched->matches[i].num_ranges = 1+re->re_nsub;
+			matched->matches[i].ranges = teco_get_ranges(info, matched->matches[i].num_ranges,
+			                                             from+from_block+offset);
+			i = ++matched_total % matched_num;
 
-				/* there might still be other matches within the current block */
-			} else {
-				break;
-			}
+			offset += info[0].rm_eo;
+		}
+
+		if (rc != REG_NOMATCH) {
+			teco_error_regex_set(error, rc, re);
+			return FALSE;
+		}
 
+		if (G_UNLIKELY(to_block != to-from &&
+		               details.rm_extend.rm_eo == to_block-from_block-offset)) {
 			/*
-			 * NOTE: The return boolean does NOT signal whether an error was generated.
+			 * Match may fall on the block boundary,
+			 * so retry matching the rest of the document.
+			 * This is the only case where we have to rescan
+			 * the same memory more than once.
 			 */
-			g_match_info_next(info, &tmp_error);
-			if (tmp_error) {
-				g_propagate_error(error, tmp_error);
+			gsize partial_start = from_block+offset+details.rm_extend.rm_so;
+
+			rc = tere_exec(re, (const chr *)buffer+partial_start, to-from-partial_start,
+			               &details, 1+re->re_nsub, info, eflags | REG_ANCHORED);
+
+			if (rc == REG_OKAY) {
+				g_free(matched->matches[i].ranges);
+				matched->matches[i].num_ranges = 1+re->re_nsub;
+				matched->matches[i].ranges = teco_get_ranges(info, matched->matches[i].num_ranges,
+				                                             from+partial_start);
+				i = ++matched_total % matched_num;
+			} else if (rc != REG_NOMATCH) {
+				teco_error_regex_set(error, rc, re);
 				return FALSE;
 			}
 		}
@@ -831,6 +823,7 @@ teco_do_search_backwards(GRegex *re, gsize from, gsize to, gint *count, GError *
 		matched_num -= matched_total;
 		i = 0;
 
+		/* try previous block */
 		to_block = from_block;
 	}
 
@@ -840,7 +833,7 @@ teco_do_search_backwards(GRegex *re, gsize from, gsize to, gint *count, GError *
 }
 
 static gboolean
-teco_do_search(GRegex *re, gsize from, gsize to, gint *count, GError **error)
+teco_do_search(regex_t *re, gsize from, gsize to, gint *count, GError **error)
 {
 	gboolean rc = *count >= 0 ? teco_do_search_forward(re, from, to, count, error)
 	                          : teco_do_search_backwards(re, from, to, count, error);
@@ -871,8 +864,7 @@ teco_do_search(GRegex *re, gsize from, gsize to, gint *count, GError **error)
 static gboolean
 teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new_chars, GError **error)
 {
-	/* FIXME: Should G_REGEX_OPTIMIZE be added under certain circumstances? */
-	GRegexCompileFlags flags = G_REGEX_MULTILINE | G_REGEX_DOTALL;
+	gint cflags = REG_ADVANCED;
 
 	teco_qreg_t *reg = teco_qreg_table_find(ctx->qreg_table_locals, "\x18", 1); /* ^X */
 	g_assert(reg != NULL);
@@ -880,15 +872,24 @@ teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new
 	if (!reg->vtable->get_integer(reg, &search_mode, error))
 		return FALSE;
 	if (teco_is_failure(search_mode))
-		flags |= G_REGEX_CASELESS;
+		cflags |= REG_ICASE;
 
 	if (ctx->flags.modifier_colon == 2)
-		flags |= G_REGEX_ANCHORED;
+		cflags |= REG_BOSONLY; /* anchored */
+
+	gint count = teco_search_parameters.count;
+
+	/*
+	 * Backwards searches require partial match information.
+	 * Fortunately, it appears to be almost for free.
+	 */
+	if (count < 0)
+		cflags |= REG_EXPECT;
 
 	/* this is set in teco_state_search_initial() */
 	if (ctx->expectstring.machine.codepage != SC_CP_UTF8) {
 		/* single byte encoding */
-		flags |= G_REGEX_RAW;
+		cflags |= REG_RAW;
 	} else if (!teco_string_validate_utf8(str)) {
 		/*
 		 * While SciTECO code is always guaranteed to be in valid UTF-8,
@@ -913,7 +914,6 @@ teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new
 	g_autoptr(teco_machine_qregspec_t) qreg_machine;
 	qreg_machine = teco_machine_qregspec_new(TECO_QREG_REQUIRED, ctx->qreg_table_locals, FALSE);
 
-	g_autoptr(GRegex) re = NULL;
 	g_autofree gchar *re_pattern;
 	/* NOTE: teco_pattern2regexp() modifies str pointer */
 	re_pattern = teco_pattern2regexp(&str, qreg_machine,
@@ -923,13 +923,19 @@ teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new
 #ifdef DEBUG
 	g_printf("REGEXP: %s\n", re_pattern);
 #endif
+
+	g_auto(regex_t) re;
+	memset(&re, 0, sizeof(re));
+
 	if (!*re_pattern)
 		goto failure;
+
 	/*
-	 * FIXME: Should we propagate at least some of the errors?
+	 * FIXME: No need to escape null-chars in re_pattern.
+	 * Actually no need to generate a regexp for TECO patterns.
 	 */
-	re = g_regex_new(re_pattern, flags, 0, NULL);
-	if (!re)
+	gint rc = tere_comp(&re, (chr *)re_pattern, strlen(re_pattern), cflags);
+	if (rc != REG_OKAY)
 		goto failure;
 
 	if (!teco_qreg_current &&
@@ -938,9 +944,7 @@ teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new
 		teco_buffer_edit(teco_search_parameters.from_buffer);
 	}
 
-	gint count = teco_search_parameters.count;
-
-	if (!teco_do_search(re, teco_search_parameters.from, teco_search_parameters.to, &count, error))
+	if (!teco_do_search(&re, teco_search_parameters.from, teco_search_parameters.to, &count, error))
 		return FALSE;
 
 	if (teco_search_parameters.to_buffer && count) {
@@ -956,12 +960,12 @@ teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new
 				teco_buffer_edit(buffer);
 
 				if (buffer == teco_search_parameters.to_buffer) {
-					if (!teco_do_search(re, 0, teco_search_parameters.dot, &count, error))
+					if (!teco_do_search(&re, 0, teco_search_parameters.dot, &count, error))
 						return FALSE;
 					break;
 				}
 
-				if (!teco_do_search(re, 0, teco_interface_ssm(SCI_GETLENGTH, 0, 0),
+				if (!teco_do_search(&re, 0, teco_interface_ssm(SCI_GETLENGTH, 0, 0),
 				                    &count, error))
 					return FALSE;
 			} while (count);
@@ -972,14 +976,14 @@ teco_state_search_process(teco_machine_main_t *ctx, teco_string_t str, gsize new
 				teco_buffer_edit(buffer);
 
 				if (buffer == teco_search_parameters.to_buffer) {
-					if (!teco_do_search(re, teco_search_parameters.dot,
+					if (!teco_do_search(&re, teco_search_parameters.dot,
 					                    teco_interface_ssm(SCI_GETLENGTH, 0, 0),
 					                    &count, error))
 						return FALSE;
 					break;
 				}
 
-				if (!teco_do_search(re, 0, teco_interface_ssm(SCI_GETLENGTH, 0, 0),
+				if (!teco_do_search(&re, 0, teco_interface_ssm(SCI_GETLENGTH, 0, 0),
 				                    &count, error))
 					return FALSE;
 			} while (count);
diff --git a/tests/testsuite.at b/tests/testsuite.at
index fc8ab37..a97e0f8 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -519,6 +519,24 @@ AT_SETUP([Search accesses wrong Q-Register table])
 TE_CHECK([[@^U.#xx/123/ @^Um{:@S/^EG.#xx/$} :Mm Mm]], 1, ignore, ignore)
 AT_CLEANUP
 
+# NOTE: This used to be a bug in the old GRegex-based implementation,
+# which surfaced only with specific build options of Glib's
+# PCRE which was not predictable.
+# It segfaulted at least on Ubuntu 20.04 (libpcre3 v2:8.39).
+# It could fail because the memory limit is exceeed,
+# but not in this case since the match string isn't too large.
+AT_SETUP([Pattern matching overflow])
+# NOTE: Creating very long lines would currently be ineffective
+# at least in UTF-8 mode.
+TE_CHECK([[100000<@I"^J">J @S"^EM^X"]], 0, ignore, ignore)
+AT_CLEANUP
+
+AT_SETUP([Block-wise backwards search])
+# Failed when using GRegex (PCRE), which had broken support for partial matches.
+# This is not an issue with terex.
+TE_CHECK([[2,8EJ @I/ABCD/ -:@S/BC/"F(0/0)' .-3"N(0/0)' ^S+2"N(0/0)']], 0, ignore, ignore)
+AT_CLEANUP
+
 AT_SETUP([Invalid buffer ids])
 TE_CHECK([[42@EB//]], 1, ignore, ignore)
 TE_CHECK([[23@EW//]], 1, ignore, ignore)
@@ -659,24 +677,6 @@ TE_CHECK([[| (0/0) ']], 1, ignore, ignore)
 AT_XFAIL_IF(true)
 AT_CLEANUP
 
-# NOTE: This bug depends on specific build options of Glib's
-# PCRE which is not predictable.
-# It segfaults at least on Ubuntu 20.04 (libpcre3 v2:8.39).
-#AT_SETUP([Pattern matching overflow])
-## Should no longer dump core.
-## It could fail because the memory limit is exceeed,
-## but not in this case since the match string isn't too large.
-#TE_CHECK([[100000<@I"X">J @S"^EM^X"]], 0, ignore, ignore)
-#AT_XFAIL_IF(true)
-#AT_CLEANUP
-
-AT_SETUP([Block-wise backwards search])
-# Crashes are caused by a glib bug when a match falls on block boundaries.
-# See teco_do_search_backwards()
-TE_CHECK([[2,8EJ @I/ABCD/ -:@S/BC/"F(0/0)' .-3"N(0/0)' ^S+2"N(0/0)']], 0, ignore, ignore)
-AT_XFAIL_IF(true)
-AT_CLEANUP
-
 AT_SETUP([Backtracking in patterns])
 # ^ES should be greedy and posessive
 TE_CHECK([[@I/   /J :@S/^ES^X/"S(0/0)']], 0, ignore, ignore)
author	Robin Haberkorn <rhaberkorn@fmsbw.de>	2026-06-28 00:39:51 +0200
committer	Robin Haberkorn <rhaberkorn@fmsbw.de>	2026-06-28 00:39:51 +0200
commit	4fe5bc6f3867096965270c90f2e1e5df77b8825f (patch)
tree	07823673c598cf4289ea0ae769c32924e1fcce10
parent	c5cb45fab6d4a63a4fcff2cf7f6801dae2ac4db2 (diff)