sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2024-11-10	Win32: fixed Unicode commandlines with newer MinGW runtimes	Robin Haberkorn	2	-1/+18
	* should also fix Win32 nightly builds * Even though we weren't using main's argv, but were using glib API for retrieving the command line in UTF-8, newer MinGW runtimes would fail when converting the Unicode command line into the system codepage would be lossy. * Most people seem to compile in a "manifest" to work around this issue. But this requires newer Windows versions and using some Microsoft tool which isn't even in $PATH. Instead, we now link with -municode and define wmain() instead, even though we still ignore argv. wmain() proabably get's the command line in UTF-16 and we'd have to convert it anyway. * See https://github.com/msys2/MINGW-packages/issues/22462
2024-11-07	if a macro ends without finding a goto label, always throw a 'Label "..." ↵	Robin Haberkorn	1	-7/+7
	not found' error * This is important with gotos in loops as in <@O/x/> where, we would otherwise get a confusing "Unterminated loop" error. * This in particular fixes the error thrown in grosciteco.tes when encountering a new unknown command.
2024-11-07	test suite: fixed failure detection in the commandline-editing test cases	Robin Haberkorn	1	-0/+4
	* The program exit code will usually not signal failures since they are caught earlier. * Therefore, we always have to capture and check stderr.
2024-11-06	fixed the Q-Reg spec machine used for implementing S^EGq$ (match one of ↵	Robin Haberkorn	3	-21/+29
	characters in Q-Register) * It was initialized only once, so it could inherit the wrong local Q-Register table. A test case has been added for this particular bug. * Also, if starting from the profile (batch mode), the state machine could be initialized without undo, which then later cause problems on rubout in interactive mode. For instance, if S^EG[a] fails and you would repeatedly type `]`, the Q-Reg name could grow indefinitely. There were probably other issues as well. Even crashes should have been possible, although I couldn't reproduce them. * Since the state machine is required only for the pattern to regexp translation and is performed anew for every character in interactive mode, we now create a fresh state machine for every call and don't attempt any undo. There might be more efficient ways, like reusing the string building's Q-Reg parser state machine.
2024-11-06	fixed possible crashes during --fake-cmdline	Robin Haberkorn	1	-4/+2
	* A test case has been added, although it might have been accidental that on caused crashes.
2024-11-05	fixup: fixed Windows builds	Robin Haberkorn	1	-1/+1

2024-11-05	fully support relocatable binaries, improving AppImages	Robin Haberkorn	4	-34/+118
	* You can now specify `--with-scitecodatadir` as a relative path, that will be interpreted relative to the binary's location. * Win32 binaries already were relocatable, but this was a Windows-specific hack. Win32 binaries are now built with `--with-scitecodatadir=.` since everything is in a single directory. * Ubuntu packages are now also built `--with-scitecodatadir=../share/sciteco`. This is not crucial for ordinary installations, but is meant for AppImage creation. * Since AppImages are now built from relocatable packages, we no longer need the unionfs-workaround from pkg2appimage. This should fix the strange root contents when autocompleting in AppImage builds. * This might also fix the appimage.github.io CI issues. I assume that because I could reproduce the issue on FreeBSD's Linuxulator in dependence of pkg2appimage's "union"-setting. See https://github.com/AppImage/appimage.github.io/pull/3402 * Determining the binary location actually turned out be hard and very platform-dependant. There are now implementations for Windows (which could also read argv[0]), Linux and generic UNIX (which works on FreeBSD, but I am not sure about the others). I believe this could also be useful on Mac OS to create app bundles, but this needs to be tested - currently the Mac OS binaries are installed into fixed locations and don't use relocation.
2024-11-03	fixed assertions in ^EGq search construct for Q-Registers with uninitialized ↵	Robin Haberkorn	1	-1/+1
	string cells Found thanks to the "infinite monkey" test.
2024-11-03	Added "infinite monkey"-style test (refs #26)	Robin Haberkorn	2	-0/+36
	Supposing that any monkey hitting keys on a typewriter, serving as a hardcopy SciTECO terminal, will sooner or later trigger bugs and crash the application, the new monkey-test.apl script emulates such a monkey. In fact it's a bit more elaborate as the generated macro follows the frequency distribution extracted from the corpus of SciTECO macro files (via monkey-parse.apl). This it is hoped, increases the chance to get into "interesting" parser states. This also adds a new hidden --sandbox argument, but it works only on FreeBSD (via Capsicum) so far. In sandbox mode, we cannot open any file or execute external commands. It is made sure, that SciTECO cannot assert in sandbox mode for scripts that would run without --sandbox, since assertions are the kind of things we would like to detect. SciTECO must be sandboxed during "infinite monkey" tests, so it cannot accidentally do any harm on the system running the tests. All macros in sandbox mode must currently be passed via --eval. Alternatively, we could add a test compilation unit and generate the test data directly in memory via C code. The new scripts are written in GNU APL 1.9 and will probably work only under FreeBSD. These scripts are not meant to be run by everyone.
2024-10-30	fixup: make sure the correct PCs, pointing directly at the command that ↵	Robin Haberkorn	1	-2/+7
	failed, get assigned to error frames
2024-10-30	fixed invalid memory access when executing the F< command (but only when ↵	Robin Haberkorn	3	-6/+6
	jumping to the beginning of the macro) * I am not sure whether this feature is really that useful... * teco_machine_main_t::macro_pc is now pointing to the __next__ character to execute, therefore it's easier to manipulate by flow control commands. Also, it can now be unsigned (gsize) like all other program counters. * Detected thanks to running the testsuite under Valgrind.
2024-10-29	fixed <N> (search all) crashes before invocations of <S> (closes #26)	Robin Haberkorn	1	-0/+7
	* There was some boilerplate code missing in teco_state_search_all_initial(), that is present in teco_state_search_initial(). * Perhaps there should be a common function to avoid redundancies? * This will also fix the initialization of the string argument codepage for <N>.
2024-10-29	PDCurses: filter out bogus double keypresses in combination with CTRL (refs #20)	Robin Haberkorn	1	-0/+12
	* Has been observed on PDCursesMod/WinGUI when pressing CTRL+Shift+6 on an US layout. I would expect code 30 (^^) to be inserted, instead PDCurses reports two keypresses (6^^). The first one is now filtered out since this will not be fixed upstream. See also https://github.com/Bill-Gray/PDCursesMod/issues/323 * Since AltGr on German layouts is reported as CTRL+ALT, we must be careful not to filter those out as well. * This is active on all PDCurses variants - who knows which other platforms will behave similarily. * You still cannot insert code 0 via CTRL+@ since PDCurses doesn't report it, but ncurses does not allow that either. This _could_ be synthesized by evaluating the modifier flags, though.
2024-10-29	teco_interface_cmdline_update() now protects against batch mode (--fake-cmdline)	Robin Haberkorn	1	-0/+7
	* Fixes the test suite on PDcurses/Win32 and therefore CI builds. * Should be necessary on UNIX as well since later on, we would access cmdline_window, which is not yet initialized. I didn't see any errors in Valgrind, though.
2024-10-28	added hidden --fake-cmdline parameter for testing command-line editing	Robin Haberkorn	1	-0/+13
	* Supports all immediate editing commands. Naturally it cannot emulate arbitrary key presses since there is no canonic ASCII-encoding of function keys. Key macros are not consequently also not testable. The --fake-cmdline parameter is instead treated very similar to a key macro expansion. * Most importantly this allows adding test cases for rubout behavior and bugs that are quite common. * Added regression test cases for the last two rubout bugs. * It's not easy to pass control codes in command line arguments in a portable manner, so the test cases will often use { and }. Control codes could be used e.g. by defining variables like RUBOUT=`printf '\b'` and referencing them with ${RUBOUT}.
2024-10-28	fixed rubbing out <:Xq>, <:^Uq> and other append-to-register operations	Robin Haberkorn	4	-31/+29
	* This was a regression introduced in 41ab5cf0289dab60ac1ddc97cf9680ee2468ea6c, which changed the semantics of teco_doc_undo_set_string(). * Removed undo_append_string() Q-Reg virtual method. append_string() now does its own undo token emission, so that we can defer the teco_doc_undo_edit() after the point that the document was initialized. This is important, so that we can configure the default encoding on new registers.
2024-10-24	<EC>: fixed hangs on UNIX	Robin Haberkorn	1	-3/+13
	* detected under FreeBSD * It turns out that it's unsafe to make the GIOChannel blocking even though the application has already terminated and the channel should be closed automatically. The channel does not report EOF, but instead we have to look for zero reads - in complete contrast to the behavior on Windows. Apparently, it's very tricky to use this API correctly (ie. it sucks). * fixup to e9bef20a8ad89d304fe3e8fafa00056d22de2326
2024-10-21	fixed some interruptions of <EC> on Win32	Robin Haberkorn	1	-1/+28
	* We now recreate the event loop with every call since it turned out that the idle watcher wouldn't be invoked after the event loop has been quit once. This at least fixes interruption of ECbash -c 'while true; do true; done'$. * Unfortunately, ECping -t 8.8.8.8$ still cannot be interrupted (unless you manually kill the process from the task manager).
2024-10-21	GTK/Win32: include trailing null byte in gtk_selection_data_set_text()	Robin Haberkorn	1	-1/+6
	* This API behaves very strangely and differently compared to UNIX/X11. When getting, it returns a trailing null for all clipboard contents (unless the clipboard is empty) and when setting, we apparently have to include it as well. At least since we cut it off when getting. Even more strangely, setting without the trailing null did work when pasting in external apps. (How they know when it's safe to throw away the trailing null is mysterious.) * In other words, this fixes X~G~.
2024-10-21	<EC>: fixed insertion of data garbage (invalid reads) and omissions	Robin Haberkorn	2	-8/+18
	* This has only ever observed on Win32, probably because the spawning behaves very differently. * The stdout watcher could be invoked even after removing the source, so we must be secured against it - this was causing some overflows and invalid reads. * Also, teco_eol_reader_convert() could return 0 even after process termination, which would sometimes result in too few bytes being inserted. This could be provoked relatively easily by invoking ECdir$ repeatedly.
2024-10-21	GTK/Win32: fixed clipboard retrieval (trailing nulls)	Robin Haberkorn	1	-0/+5
	* Contrary to the Gtk documentation, the gtk_selection_data_get_length() already includes a trailing null, so we always inserted a bogus null char when using G~ or ^EQ~.
2024-10-21	fixed EOL conversion on UTF-8 texts	Robin Haberkorn	1	-1/+1
	* The old bug of saving gchar in gints, so teco_eol_reader_t::last_char could become negative. * When converting from an UTF-8 text with CRLF linebreaks, we could have data loss and corruptions. * On strings ending in UTF-8 characters, teco_eol_reader_t::offset would overflow, resulting in invalid reads and potentially insertion of data garbage. I observed this with G~ on Gtk. * Test cased updated. Couldn't reproduce the bug with the test suite, though.
2024-10-19	<EC>: perhaps fixed race conditions and problems when creating and ↵	Robin Haberkorn	2	-4/+34
	terminating process groups on Win32 * Sometimes already the job assignment failed in CI builds. We now check whether the process is still alive before throwing an error. * We now set the JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE flag. This theoretically shouldn't be necessary when using TerminateJobObject(), but who knows.
2024-10-18	fixed the "Editing local registers in macro calls" check	Robin Haberkorn	4	-6/+12
	* The previous check could result in false positives if you are editing a local Q-Register, that will be destroyed at the end of the current macro frame, and call another non-colon modified macro. * It must instead be invalid to keep the register edited only if it belongs to the local Q-Registers that are about to be freed. In other words, the table that the currently edited Q-Register belongs to, must be the one we're about to destroy. * This fixes the solarized.toggle (F5) macro when using the Solarized color scheme.
2024-10-16	Revert "replaced bool completely with gboolean"	Robin Haberkorn	2	-7/+10
	This reverts commit 024d26ac0cd869826801889f1299df34676fdf57. This was re-introducing Clang warnings since gboolean is signed. I should have read the git blame before re-introducing gboolean...
2024-10-16	fixup: use teco_machine_t::must_undo instead of trying to identify the ↵	Robin Haberkorn	6	-43/+30
	current state machine * The previous solution was not wrong, but unnecessarily complex. We already have a flag for exactly this purpose. * Avoid redundancies by introducing teco_machine_stringbuilding_set_codepage().
2024-10-15	fixed memory corruptions due to undoing the ↵	Robin Haberkorn	5	-8/+42
	teco_machine_stringbuilding_t::codepage * It's contained in teco_machine_main_t which is created per macro call frame. So after macro calls, the machine no longer exists. It is therefore unsafe to undo its members indiscriminately. * On the other hand, we must undo the codepage setting when run interactively, so it is now only undone when belonging to the commandline macro frame. * This was actually causing memory corruptions on every fnkeys cursor movement, but never caused crashes - probably because the invalid pointers are always pointing to unused parts of the C call stack. * Initially broken in b31b8871.
2024-10-15	fixed memory leak when replacing command lines	Robin Haberkorn	1	-3/+2
	* this would also leak a few bytes on every of fnkeys.tes' movement commands
2024-10-15	netbsd-curses: fixed the default escape delay	Robin Haberkorn	1	-22/+22
	* Apparently, netbsd-curses overwrites the escdelay on initscr() (if $ESCDELAY is not set), so we have to apply the default 25ms after screen initialization. * The info line is not drawn correctly on netbsd-curses, but only on st/simpleterm. I assume this is just a shortcoming of the included terminfo entry.
2024-10-15	improved support for braces within loops: warn about unclosed braces and ↵	Robin Haberkorn	2	-3/+14
	allow breaking from within braces For instance, you can now write <23(1;)> without leaving anything on the stack.
2024-10-06	fixup b36ff2502ae3b0e18fa862a01fba9cc2c9067e31: fixes pattern match ↵	Robin Haberkorn	1	-0/+1
	characters after escaped characters
2024-10-05	Gtk UI: support setting and getting clipboards containing null bytes	Robin Haberkorn	3	-13/+57
	* added TECO_ERROR_CLIPBOARD for all clipboard-related errors
2024-10-04	pattern match characters support ^Q/^R now as well	Robin Haberkorn	2	-11/+41
	* makes it possible, albeit cumbersome, to escape pattern match characters * For instance, to search for ^Q, you now have to type S^Q^Q^Q^Q$. To search for ^E you have to type S^Q^Q^Q^E$. But the last character cannot be typed with carets currently (FIXME?). For pattern-only characters, two ^Q should be sufficient as in S^Q^Q^X$. * Perhaps it would be more elegant to abolish the difference between string building and pattern matching characters to avoid double quoting. But then all string building constructs like ^EQq should operate at the pattern level as well (ie. match the contents of register q verbatim instead of being interpreted as a pattern). TECOC and TECO-64 don't do that either. If we leave everything as it is, at least a new string building construct should be added for auto-quoting patterns (analoguous to ^EN and ^E@).
2024-09-28	replaced bool completely with gboolean	Robin Haberkorn	2	-10/+7

2024-09-28	fixed compilation on most compilers due to goto beyond g_auto() declaration	Robin Haberkorn	1	-3/+3

2024-09-28	some minor fixes to help (?) command	Robin Haberkorn	1	-2/+2

2024-09-28	check the memory limit and allow interruptions when loading files	Robin Haberkorn	1	-5/+23
	* Previously you could open files of arbitrary size and the limit would be checked only afterwards. * Many, but not all, cases should now be detected earlier. Since Scintilla allocates lots of memory as part of rendering, you can still run into memory limits even after successfully loading the file. * Loading extremely large files can also be potentially slow. Therefore, it is now possible to interrupt via CTRL+C. Again, if the UI is blocking because of stuff done as part of rendering, you still may not be able to interrupt the "blocking" operation.
2024-09-28	FreeBSD/jemalloc: fixed recovery after hitting memory limit	Robin Haberkorn	1	-0/+10
	* We now set opt.retain=false for the process, so jemalloc returns freed memory and the RSS decreases when recovering from memory limit hits. This should be safe at least on FreeBSD. * Either the opt.retain option is new or I was previously testing this only on 32-bit systems.
2024-09-28	fixed memory limiting if the process' memory usage is larger than 2GB and ↵	Robin Haberkorn	1	-7/+8
	overflow checking * teco_memory_usage is now an unsigned integer. * Unfortunately we currently rely on the variable being int-sized since we use atomic operations. This means on 64-bit systems, limiting will not work as expected if you set the limit larger than 4GB. Not sure whether this should be fixed. * Calling teco_memory_check() with a non-null request-size was totally broken and could result in bogus failures. This is currently used exclusively for checking backwards searches.
2024-09-26	Git lexer: added support for TAG_EDITMSG and MERGE_MSG	Robin Haberkorn	1	-1/+2
	* Curses: "icons" have also been added
2024-09-25	Curses: added "Git" icons for COMMIT_EDITMSG and git-rebase-todo	Robin Haberkorn	1	-0/+2

2024-09-25	fixed rubbing out (some) string building constructs at the beginning of the ↵	Robin Haberkorn	1	-7/+2
	command line argument * For instance, you can now rub out ^Q^W at the beginning of a string argument. Otherwise, pressing Ctrl+W after ^Q^W would rub out only the ^W. The next Ctrl+W would then insert ^W, due to special immediate editing inhibition after ^Q. * This still only works if the string building construct expanded to at least one byte. Suppose you have ^EQq, expanding to nothing, pressing Ctrl+W would chain to the default teco_state_process_edit_cmd() and the entire command would be rubbed out. This is probably tolerable.
2024-09-25	inhibit some immediate editing commands after ^Q/^R string building constructs	Robin Haberkorn	2	-1/+29
	* This allows you to type ^Q^U (which would otherwise rub out the entire argument) and ^Q^W (which would otherwise rub out the ^Q). * ^Q^U coincidentally worked previously since the teco_state_stringbuilding_escaped state would default to teco_state_process_edit_cmd(). But it's better to make this feauture explicit. * This finally makes it possible to insert the ^W (23) char into a buffer. In interactive mode, you can still only type Caret+W as a string building construct. * ^G could also be inhibited after ^Q, but the control char is not used anywhere yet, so there is no point in doing that.
2024-09-23	allow OSC-52 clipboards on all terminal emulators	Robin Haberkorn	3	-19/+31
	* The XTerm version is still checked if we detect running under XTerm. * Actually, the XTerm implementation is broken for Unicode clipboard contents. * Kitty supports OSC-52, but you __must__ enable read-clipboard. With read-clipboard-ask, there will be a timeout. But we cannot read without a timeout since otherwise we would hang indefinitely if the escape sequence turns out to not work. * For urxvt, I have hacked an existing extension: https://gist.github.com/rhaberkorn/d7406420b69841ebbcab97548e38b37d * st currently supports only setting the clipboard, but not querying it.
2024-09-22	Curses: always wgetch() on a dummy pad, avoiding unnecessary wrefresh()	Robin Haberkorn	1	-35/+45
	* This is especially important on platforms, requiring the wgetch() poll workaround to detect CTRL+C (PDCurses/WinGUI). wgetch(cmdline_window) would implicitly wrefresh(cmdline_window), which resulted in additional flickering when pressing function keys. This is no longer so important since key macros are processed as an unity and the cmdline will be updated only after processing all of the characters contained in them, ie. only once after the key press. Still, there could have still been unwanted side effects. At the very least, wgetch(input_pad) should be faster. * The XTerm clipboard implementation was getch()ing on stdscr, so potentially suffered from the same problem. It should be tested again. * Since keypad() is now always enabled even on netbsd-curses. I assume that the function key processing bug in netbsd-curses has been fixed by now. We are not building any releases with netbsd-curses. But it should be retested. * It does not resolve all flickering issues on PDCurses/WinGUI. Both the command line and the Scintilla view still flicker near the cursor. See https://github.com/Bill-Gray/PDCursesMod/issues/322
2024-09-21	disable shared libraries by default	Robin Haberkorn	1	-0/+5
	* This is necessary to fix the Unicode test suite on Win32, so I was always passing in --disable-shared manually. It's easy to forget though when building from scratch. * We don't currently install any (shared) library, so this is safe on all platforms. In fact on all other platforms, libtool detects that and doesn't generate wrapper binaries in any way. Only on win32 it's apparently buggy.
2024-09-21	PDCurses/WinGUI: fixed Unicode icons on win32	Robin Haberkorn	3	-8/+27
	* Turns out that "%C" in wprintw() does not work with non-ANSI chars. * We still don't want to introduce the Curses widechar API, so I added teco_curses_add_wc() as a replacement for wadd_wch().
2024-09-21	syntax errors are reported with "echoed" characters, ie. as purely printable ↵	Robin Haberkorn	1	-1/+3
	characters * Some characters like LF wouldn't be displayed in the message line correctly. * In fact the Gtk UI cannot display any of the control characters correctly. * I was considering deferring all echoing/formatting to the UIs, so they can use TecoGtkLabel or teco_curses_format_str(). This is not possible since messages transmitted via GError must not contain null-bytes, so these need to be sorted out earlier anyway. * This should also fix syntax errors in PDCurses for Windows where "%C" apparently doesn't work with non-ANSI codepoints.
2024-09-20	^W^W and ^V^V can be typed completely with upcarets now and they case fold ↵	Robin Haberkorn	2	-25/+99
	all expansions of ^EQq, ^EUq and so on * Previously, there was no way to enter upper-case mode in interactive commands since the Ctrl+W immediate editing command is interpreted everywhere. * Without the case folding of ^EQq/^EUq results, the upper and lower case modes are actually pretty useless considering that modern keyboards have caps lock. So it was clear we need this, regardless of what the classic TECOs did. The TECO-11 manual is not very clear on this. tecoc apparently does not case-fold ^EQq results. * This opens up new idioms, for instance `EUq^W^W^EQq$` in order to upper case register q. It's also the only way you can currently upper-case Unicode codepoints.
2024-09-19	Ctrl+^ is no longer translated to a single caret in string building (refs #20)	Robin Haberkorn	1	-4/+21
	* Ctrl+^ (30) and Caret+caret (^^) were both translated to a single caret. While there might be some reason to keep this behavior for double-caret, it is certainly pointless for Ctrl+^. * That gives you an easy way to insert Ctrl+^ (code 30) into documents with <I>. Perviously, you either had to insert a double-caret, typing 4 carets in a row, or you had to use <EI> or 30I$. * The special handling of double-caret could perhaps be abolished altogether, as we also have ^Q^ to escape plain carets. The double-caret syntax is very archaic from the time that there was no proper ^Q as far as I recall correctly.