From d03667b609c91a18fe975686b8519a2599138dc3 Mon Sep 17 00:00:00 2001 From: Robin Haberkorn Date: Sun, 31 May 2026 22:02:18 +0200 Subject: updated TODO --- TODO | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) (limited to 'TODO') diff --git a/TODO b/TODO index 9846c07..6588fe7 100644 --- a/TODO +++ b/TODO @@ -74,12 +74,23 @@ Known Bugs: and b) the file mode and ownership of re-created files can be preserved. We should fall back silently to an (inefficient) memory copy or temporary file strategy if this is detected. - * Crashes on large files: S^EM^X$ (regexp: (.)+) + * All backward searches from the end of excessively large files can be very + slow, especially in UTF mode, since you are always producing + all matches over the entire document. + Perhaps scan in 4kb blocks from dot upwards, but with partial matches. + When getting partial matches, the match falls on a block boundary and + we can extended the scanned area downwards until dot. + This currently doesn't work with glib's regexp (PCRE) since + g_match_info_fetch_pos() handles partial matches like errors. + Here's an upstream merge request to fix that: + https://gitlab.gnome.org/GNOME/glib/-/merge_requests/5199 + * Crashes on large files: S^EM^X$ (regexp: (?:.)+) Happens because the Glib regex engine is based on a recursive (backtracking) - Perl regex library. + Perl regex library and glib doesn't expose pcre_extra. + We could include `(*LIMIT_RECURSION=d)` in the pattern, though. I can provoke the problem only on Ubuntu 20.04. - This is apparently impossible to fix as long as we do not - have control over the regex engine build. + We can try g_regex_match_all_full() which will use a DFA, but + it doesn't capture subexpressions. We need something based on a non-backtracking Thompson's NFA with Unicode (UTF-8), see https://swtch.com/~rsc/regexp/ Basically only RE2 would check all the boxes. @@ -88,7 +99,14 @@ Known Bugs: re2 should be an optional dependency, so we can still build against the glib APIs. Optionally, I could build a PCRE-compatible wrapper for Rust's regex crate. - It would also be possible to port hxrex to UTF-8 and add it as a submodule. + It would also be possible to port one of Henry Spencer's engines (hxrex or its + PosgreSQL derivation or the version from Vim) to UTF-8 and add it as a submodule. + * It is still possible to hang searches on huge files since a single match + could still scan too much memory - e.g. try searching for a word that + occurs only at the end of the huge file. + Can probably be avoided by including `(*MATCH_LIMIT=d)` in the pattern. + A new regexp engine should also allow interruptions within a single match, + so we don't have to invent limits like that. * It is still possible to crash SciTECO using recursive functions, since they map to the C program's call stack. It is perhaps best to use another stack of @@ -220,6 +238,10 @@ Known Bugs: All blocking operations must be within an event loop and call into teco_interface_is_interrupted() to potentially drive the UI and detect CTRL+C presses. + * When adding the OBS repo to Ubuntu, Synaptic showed + sciteco-curses:s390x packages on an amd64. + The package could be installed without problems, though. + Probably a Synaptic bug. Features: * Should we support *.sgml files with the HTML lexer? @@ -241,6 +263,7 @@ Features: * opener.tes should try to center the opened line (SCI_SETFIRSTVISIBLELINE). However, this would require a new ED hook, so we can query SCI_LINESONSCREEN. + * opener.tes: support *not* opening the .teco_session. * Rubout of SCI_GOTOPOS could also restore the vertical scrolling position (SCI_SETFIRSTVISIBLELINE). So e.g. rubbing out ZJ restores the exact view. @@ -823,6 +846,8 @@ Features: * :^W should perhaps inhibit the caret scrolling. When using ^W purely as a wait command, this could be undesirable. Also, it is undesirable for some animations (e.g. shooting in tank.tes). + * New ED option: use fsync() when writing files. + Could be useful on systems that crash frequently. Optimizations: * Use SC_DOCUMENTOPTION_STYLES_NONE in batch mode. @@ -946,6 +971,10 @@ Optimizations: co take care of it. However insertion commands would also have to take care of expanding LF to the buffers EOL sequence. + * Perhaps some operations can be sped up with mmap() instead of loading + files into the heap. glib has an appropriate function with fallback for + platforms without mmap(). + Perhaps file munging and could make use of it. Documentation: * Doxygen docs could be deployed on Github pages -- cgit v1.2.3