From d03667b609c91a18fe975686b8519a2599138dc3 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <rhaberkorn@fmsbw.de>
Date: Sun, 31 May 2026 22:02:18 +0200
Subject: updated TODO

---
 TODO | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

(limited to 'TODO')

diff --git a/TODO b/TODO
index 9846c07..6588fe7 100644
--- a/TODO
+++ b/TODO
@@ -74,12 +74,23 @@ Known Bugs:
    and b) the file mode and ownership of re-created files can be preserved.
    We should fall back silently to an (inefficient) memory copy or temporary
    file strategy if this is detected.
- * Crashes on large files: S^EM^X$ (regexp: (.)+)
+ * All backward searches from the end of excessively large files can be very
+   slow, especially in UTF mode, since you are always producing
+   all matches over the entire document.
+   Perhaps scan in 4kb blocks from dot upwards, but with partial matches.
+   When getting partial matches, the match falls on a block boundary and
+   we can extended the scanned area downwards until dot.
+   This currently doesn't work with glib's regexp (PCRE) since
+   g_match_info_fetch_pos() handles partial matches like errors.
+   Here's an upstream merge request to fix that:
+   https://gitlab.gnome.org/GNOME/glib/-/merge_requests/5199
+ * Crashes on large files: S^EM^X$ (regexp: (?:.)+)
    Happens because the Glib regex engine is based on a recursive (backtracking)
-   Perl regex library.
+   Perl regex library and glib doesn't expose pcre_extra.
+   We could include `(*LIMIT_RECURSION=d)` in the pattern, though.
    I can provoke the problem only on Ubuntu 20.04.
-   This is apparently impossible to fix as long as we do not
-   have control over the regex engine build.
+   We can try g_regex_match_all_full() which will use a DFA, but
+   it doesn't capture subexpressions.
    We need something based on a non-backtracking Thompson's NFA with Unicode (UTF-8), see
    https://swtch.com/~rsc/regexp/
    Basically only RE2 would check all the boxes.
@@ -88,7 +99,14 @@ Known Bugs:
    re2 should be an optional dependency, so we can still build against the
    glib APIs.
    Optionally, I could build a PCRE-compatible wrapper for Rust's regex crate.
-   It would also be possible to port hxrex to UTF-8 and add it as a submodule.
+   It would also be possible to port one of Henry Spencer's engines (hxrex or its
+   PosgreSQL derivation or the version from Vim) to UTF-8 and add it as a submodule.
+ * It is still possible to hang searches on huge files since a single match
+   could still scan too much memory - e.g. try searching for a word that
+   occurs only at the end of the huge file.
+   Can probably be avoided by including `(*MATCH_LIMIT=d)` in the pattern.
+   A new regexp engine should also allow interruptions within a single match,
+   so we don't have to invent limits like that.
  * It is still possible to crash SciTECO using recursive functions,
    since they map to the C program's call stack.
    It is perhaps best to use another stack of
@@ -220,6 +238,10 @@ Known Bugs:
    All blocking operations must be within an event loop and call into
    teco_interface_is_interrupted() to potentially drive the UI and
    detect CTRL+C presses.
+ * When adding the OBS repo to Ubuntu, Synaptic showed
+   sciteco-curses:s390x packages on an amd64.
+   The package could be installed without problems, though.
+   Probably a Synaptic bug.
 
 Features:
  * Should we support *.sgml files with the HTML lexer?
@@ -241,6 +263,7 @@ Features:
  * opener.tes should try to center the opened line
    (SCI_SETFIRSTVISIBLELINE).
    However, this would require a new ED hook, so we can query SCI_LINESONSCREEN.
+ * opener.tes: support *not* opening the .teco_session.
  * Rubout of SCI_GOTOPOS could also restore the vertical
    scrolling position (SCI_SETFIRSTVISIBLELINE).
    So e.g. rubbing out ZJ restores the exact view.
@@ -823,6 +846,8 @@ Features:
  * :^W should perhaps inhibit the caret scrolling.
    When using ^W purely as a wait command, this could be undesirable.
    Also, it is undesirable for some animations (e.g. shooting in tank.tes).
+ * New ED option: use fsync() when writing files.
+   Could be useful on systems that crash frequently.
 
 Optimizations:
  * Use SC_DOCUMENTOPTION_STYLES_NONE in batch mode.
@@ -946,6 +971,10 @@ Optimizations:
    co take care of it.
    However insertion commands would also have to take care of expanding
    LF to the buffers EOL sequence.
+ * Perhaps some operations can be sped up with mmap() instead of loading
+   files into the heap. glib has an appropriate function with fallback for
+   platforms without mmap().
+   Perhaps file munging and <EI> could make use of it.
 
 Documentation:
  * Doxygen docs could be deployed on Github pages
-- 
cgit v1.2.3