aboutsummaryrefslogtreecommitdiffhomepage
path: root/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'TODO')
-rw-r--r--TODO27
1 files changed, 0 insertions, 27 deletions
diff --git a/TODO b/TODO
index 6588fe7..b2de61c 100644
--- a/TODO
+++ b/TODO
@@ -74,33 +74,6 @@ Known Bugs:
and b) the file mode and ownership of re-created files can be preserved.
We should fall back silently to an (inefficient) memory copy or temporary
file strategy if this is detected.
- * All backward searches from the end of excessively large files can be very
- slow, especially in UTF mode, since you are always producing
- all matches over the entire document.
- Perhaps scan in 4kb blocks from dot upwards, but with partial matches.
- When getting partial matches, the match falls on a block boundary and
- we can extended the scanned area downwards until dot.
- This currently doesn't work with glib's regexp (PCRE) since
- g_match_info_fetch_pos() handles partial matches like errors.
- Here's an upstream merge request to fix that:
- https://gitlab.gnome.org/GNOME/glib/-/merge_requests/5199
- * Crashes on large files: S^EM^X$ (regexp: (?:.)+)
- Happens because the Glib regex engine is based on a recursive (backtracking)
- Perl regex library and glib doesn't expose pcre_extra.
- We could include `(*LIMIT_RECURSION=d)` in the pattern, though.
- I can provoke the problem only on Ubuntu 20.04.
- We can try g_regex_match_all_full() which will use a DFA, but
- it doesn't capture subexpressions.
- We need something based on a non-backtracking Thompson's NFA with Unicode (UTF-8), see
- https://swtch.com/~rsc/regexp/
- Basically only RE2 would check all the boxes.
- RE2 doesn't have a native C API, so we would also have to import the
- https://github.com/marcomaggi/cre2/ wrapper.
- re2 should be an optional dependency, so we can still build against the
- glib APIs.
- Optionally, I could build a PCRE-compatible wrapper for Rust's regex crate.
- It would also be possible to port one of Henry Spencer's engines (hxrex or its
- PosgreSQL derivation or the version from Vim) to UTF-8 and add it as a submodule.
* It is still possible to hang searches on huge files since a single match
could still scan too much memory - e.g. try searching for a word that
occurs only at the end of the huge file.