sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2024-12-04	implemented ^Y/^S commands for receiving pattern match/insertion ranges and ↵	Robin Haberkorn	1	-0/+5
	lengths (refs #27) * Allows storing pattern matches into Q-Registers (^YXq). * You can also refer to subpatterns marked by ^E[...] by passing a number > 0. This is equivalent to \0-9 references in many programming languages. * It's especially useful for supporting TECO's equivalent of structural regular expressions. This will be done with additional macros. * You can also simply back up to the beginning of an insertion or search. So I...$^SC leaves dot at the beginning of the insertion. S...$^SC leaves dot before the found pattern. This has been previously requested by users. * Perhaps there should be ^Y string building characters as well to backreference in search-replacement commands (TODO). This means that the search commands would have to store the matched text itself in teco_range_t structures since FR deletes the matched text before processing the replacement string. It could also be made into a FR/FS-specific construct, so we don't fetch the substrings unnecessarily. * This differs from DEC TECO in always returning the same range even after dot movements, since we are storing start/end byte positions instead of only the length. Also DEC TECO does not support fetching subpattern ranges.
2024-11-25	fixed operator precedence application (fixup ↵	Robin Haberkorn	1	-1/+2
	5597bc72671d0128e6f0dba446c4dc8d47bf37d0) * Using teco_expressions_eval() is wrong since it does not pay attention to precedences. If you have multiple higher precedence operators in a row, as in 2+345, the lower precedence operators would be resolved prematurely. * Instead we now call teco_expressions_calc() repeatedly but only for lower precedence operators on the stack top. This makes sure that as much of the expression as possible is evaluated at any given moment.
2024-11-25	fixed subtle operator precedence bug	Robin Haberkorn	1	-0/+6
	* It was possible to provoke operator right-associativity when placing a high-precedence operator between two low-precedence operators. 1-65-1 evaluated to -28 instead of the expected -30. The reason is that SciTECO relies on operators to be resolved from left-to-right as soon as possible. The higher precedence operator prevents that and pushing the 2nd "-" only evaluated 65. At the end 1-30-1 would be left on the stack. teco_expressions_eval() however evaluates from right-to-left which is wrong in this case. Instead, we now do a full eval on every operator with a lower precedence, making sure that 1-30 is evaluated first.
2024-11-24	added special Q-Register ":" for accessing dot	Robin Haberkorn	1	-0/+1
	* We cannot call it "." since that introduces a local register and we don't want to add an unnecessary syntactic exception. * Allows the idiom [: ... ]: to temporarily move around. Also, you can now write ^E\: without having to store dot in a register first. * In the future we might add an ^E register as well for byte offsets. However, there are much fewer useful applications. * Of course, you can now also write nU: instead of nJ, Q: instead of "." and n%: instead of "nC.". However it's all not really useful.
2024-11-23	disallow setting the radix to values lower than 2	Robin Haberkorn	1	-0/+6
	* This would actually causes crashes when trying to format numbers. * The ^R local register has a custom set_integer() method now, so that the check is performed also when using nU.^X.
2024-11-23	the search mode and current radix are mapped to __local__ Q-Registers ^X and ↵	Robin Haberkorn	1	-0/+3
	^R now (refs #17) * This way the search mode and radix are local to the current macro frame, unless the macro was invoked with :Mq. If colon-modified, you can reproduce the same effect by calling [.^X 0^X ... ].^X * The radix register is cached in the Q-Reg table as an optimization. This could be done with the other "special" registers as well, but at the cost of larger stack frames. * In order to allow constructs like [.^X typed with upcarets, the Q-Register specification syntax has been extended: ^c is the corresponding control code instead of the register "^".
2024-11-23	implemented search mode flag (^X): allow case-sensitive searches (closes #17)	Robin Haberkorn	1	-0/+2
	* Usually you will only want -^X for enabling case sensitive searches and 0^X for case-insensitive searches (which is also the default). * An open question is what happens if the user sets -^X and then calls a macro. The search mode flag should probably be stacked away along with the search-string. This means we'd need a ^X special Q-Reg as well, so you can write [^X[_ 0^X S...$ ]_]^X. Alternatively, the search mode flag should be a property of the macro frame, along with the radix.
2024-11-07	test suite: fixed failure detection in the commandline-editing test cases	Robin Haberkorn	2	-3/+13
	* The program exit code will usually not signal failures since they are caught earlier. * Therefore, we always have to capture and check stderr.
2024-11-06	fixed the Q-Reg spec machine used for implementing S^EGq$ (match one of ↵	Robin Haberkorn	1	-0/+4
	characters in Q-Register) * It was initialized only once, so it could inherit the wrong local Q-Register table. A test case has been added for this particular bug. * Also, if starting from the profile (batch mode), the state machine could be initialized without undo, which then later cause problems on rubout in interactive mode. For instance, if S^EG[a] fails and you would repeatedly type `]`, the Q-Reg name could grow indefinitely. There were probably other issues as well. Even crashes should have been possible, although I couldn't reproduce them. * Since the state machine is required only for the pattern to regexp translation and is performed anew for every character in interactive mode, we now create a fresh state machine for every call and don't attempt any undo. There might be more efficient ways, like reusing the string building's Q-Reg parser state machine.
2024-11-06	fixed possible crashes during --fake-cmdline	Robin Haberkorn	1	-0/+4
	* A test case has been added, although it might have been accidental that on caused crashes.
2024-11-04	monkey-test.apl: avoid some bogus failures due to insufficient handling of ↵	Robin Haberkorn	2	-6/+6
	the pclose() return value
2024-11-03	fixed assertions in ^EGq search construct for Q-Registers with uninitialized ↵	Robin Haberkorn	1	-0/+5
	string cells Found thanks to the "infinite monkey" test.
2024-11-03	Added "infinite monkey"-style test (refs #26)	Robin Haberkorn	3	-0/+89
	Supposing that any monkey hitting keys on a typewriter, serving as a hardcopy SciTECO terminal, will sooner or later trigger bugs and crash the application, the new monkey-test.apl script emulates such a monkey. In fact it's a bit more elaborate as the generated macro follows the frequency distribution extracted from the corpus of SciTECO macro files (via monkey-parse.apl). This it is hoped, increases the chance to get into "interesting" parser states. This also adds a new hidden --sandbox argument, but it works only on FreeBSD (via Capsicum) so far. In sandbox mode, we cannot open any file or execute external commands. It is made sure, that SciTECO cannot assert in sandbox mode for scripts that would run without --sandbox, since assertions are the kind of things we would like to detect. SciTECO must be sandboxed during "infinite monkey" tests, so it cannot accidentally do any harm on the system running the tests. All macros in sandbox mode must currently be passed via --eval. Alternatively, we could add a test compilation unit and generate the test data directly in memory via C code. The new scripts are written in GNU APL 1.9 and will probably work only under FreeBSD. These scripts are not meant to be run by everyone.
2024-10-30	testsuite: added --valgrind option for running SciTECO under Valgrind (memcheck)	Robin Haberkorn	2	-1/+8
	* Any memory error will let the test case fail with code 66. * You can also call make check TESTSUITEFLAGS="--valgrind" * There is no program test for Valgrind in configure.ac for the time being. `valgrind` must be in $PATH. * All CI testsuite runs under Ubuntu are now with Valgrind.
2024-10-29	fixed <N> (search all) crashes before invocations of <S> (closes #26)	Robin Haberkorn	1	-0/+1
	* There was some boilerplate code missing in teco_state_search_all_initial(), that is present in teco_state_search_initial(). * Perhaps there should be a common function to avoid redundancies? * This will also fix the initialization of the string argument codepage for <N>.
2024-10-28	added hidden --fake-cmdline parameter for testing command-line editing	Robin Haberkorn	2	-2/+16
	* Supports all immediate editing commands. Naturally it cannot emulate arbitrary key presses since there is no canonic ASCII-encoding of function keys. Key macros are not consequently also not testable. The --fake-cmdline parameter is instead treated very similar to a key macro expansion. * Most importantly this allows adding test cases for rubout behavior and bugs that are quite common. * Added regression test cases for the last two rubout bugs. * It's not easy to pass control codes in command line arguments in a portable manner, so the test cases will often use { and }. Control codes could be used e.g. by defining variables like RUBOUT=`printf '\b'` and referencing them with ${RUBOUT}.
2024-10-21	fixed EOL conversion on UTF-8 texts	Robin Haberkorn	3	-5/+7
	* The old bug of saving gchar in gints, so teco_eol_reader_t::last_char could become negative. * When converting from an UTF-8 text with CRLF linebreaks, we could have data loss and corruptions. * On strings ending in UTF-8 characters, teco_eol_reader_t::offset would overflow, resulting in invalid reads and potentially insertion of data garbage. I observed this with G~ on Gtk. * Test cased updated. Couldn't reproduce the bug with the test suite, though.
2024-10-18	fixed the "Editing local registers in macro calls" check	Robin Haberkorn	1	-0/+1
	* The previous check could result in false positives if you are editing a local Q-Register, that will be destroyed at the end of the current macro frame, and call another non-colon modified macro. * It must instead be invalid to keep the register edited only if it belongs to the local Q-Registers that are about to be freed. In other words, the table that the currently edited Q-Register belongs to, must be the one we're about to destroy. * This fixes the solarized.toggle (F5) macro when using the Solarized color scheme.
2024-10-15	improved support for braces within loops: warn about unclosed braces and ↵	Robin Haberkorn	1	-0/+7
	allow breaking from within braces For instance, you can now write <23(1;)> without leaving anything on the stack.
2024-10-04	pattern match characters support ^Q/^R now as well	Robin Haberkorn	1	-0/+8
	* makes it possible, albeit cumbersome, to escape pattern match characters * For instance, to search for ^Q, you now have to type S^Q^Q^Q^Q$. To search for ^E you have to type S^Q^Q^Q^E$. But the last character cannot be typed with carets currently (FIXME?). For pattern-only characters, two ^Q should be sufficient as in S^Q^Q^X$. * Perhaps it would be more elegant to abolish the difference between string building and pattern matching characters to avoid double quoting. But then all string building constructs like ^EQq should operate at the pattern level as well (ie. match the contents of register q verbatim instead of being interpreted as a pattern). TECOC and TECO-64 don't do that either. If we leave everything as it is, at least a new string building construct should be added for auto-quoting patterns (analoguous to ^EN and ^E@).
2024-09-28	check the memory limit and allow interruptions when loading files	Robin Haberkorn	1	-0/+5
	* Previously you could open files of arbitrary size and the limit would be checked only afterwards. * Many, but not all, cases should now be detected earlier. Since Scintilla allocates lots of memory as part of rendering, you can still run into memory limits even after successfully loading the file. * Loading extremely large files can also be potentially slow. Therefore, it is now possible to interrupt via CTRL+C. Again, if the UI is blocking because of stuff done as part of rendering, you still may not be able to interrupt the "blocking" operation.
2024-09-19	fixed Load/Save Q-Reg tests on Mac OS and Win32	Robin Haberkorn	1	-8/+3

2024-09-19	"special" Q-Registers now support EQq/.../ (load) and E%q/.../ (save) commands	Robin Haberkorn	1	-0/+16
	* @EQ$/.../ sets the current directory from the contents of the given file. @E%$/.../ stores the currend directory in the given file. * @EQ/.../ will fail, just like ^U...$. @E%/.../ stores the current buffer's name in the given file. It's especially useful with the clipboard registers. There could still be a minor bug in @E%~/.../ with regard to EOL normalization as teco_view_save() will use the EOL style of the current document, which may not be the style of the Q-Reg contents. Conversions can generally be avoided for these particular commands. But without teco_view_save() we'd have to care about save point creation.
2024-09-18	check that local register is not edited at the end of macro calls	Robin Haberkorn	1	-0/+5
	* This was unsafe and could easily result in crashes, since teco_qreg_current would afterwards point to an already freed Q-Register. * Since automatically editing another register or buffer is not easy to do right, we throw an error instead.
2024-09-17	fixed searches on completely new and empty documents	Robin Haberkorn	1	-0/+4
	This was throwing glib assertions.
2024-09-16	test suite: enable the recursion overflow test case everywhere	Robin Haberkorn	2	-3/+6
	* It wasn't failing on FreeBSD because there are different default stacksize limits. We now set it to 8MB everywhere.
2024-09-13	remaining types of program counters changed to gsize/gssize	Robin Haberkorn	1	-0/+4
	* This fixes F< to the beginning of the macro, which was broken in 73d574b71a10d4661ada20275cafde75aff6c1ba. teco_machine_main_t::macro_pc actually has to be signed as it is sometimes set to -1.
2024-09-11	fixed searches in single-byte encoded documents	Robin Haberkorn	1	-8/+1
	* while code is guaranteed to be in valid UTF-8, this cannot be said about the result of string building. * The search pattern can end up with invalid Unicode bytes even when searching on UTF-8 buffers, e.g. if ^EQq inserts garbage. There are currently no checks. * When searching on a raw buffer, it must be possible to search for arbitrary bytes (^EUq). Since teco_pattern2regexp() was always expecting clean UTF-8 input, this would sometimes skip over too many bytes and could even crash. * Instead, teco_pattern2regexp() now takes the <S> target codepage into account.
2024-09-11	the SciTECO parser is Unicode-based now (refs #5)	Robin Haberkorn	1	-4/+3
	The following rules apply: * All SciTECO macros __must__ be in valid UTF-8, regardless of the the register's configured encoding. This is checked against before execution, so we can use glib's non-validating UTF-8 API afterwards. * Things will inevitably get slower as we have to validate all macros first and convert to gunichar for each and every character passed into the parser. As an optimization, it may make sense to have our own inlineable version of g_utf8_get_char() (TODO). Also, Unicode glyphs in syntactically significant positions may be case-folded - just like ASCII chars were. This is is of course slower than case folding ASCII. The impact of this should be measured and perhaps we should restrict case folding to a-z via teco_ascii_toupper(). * The language itself does not use any non-ANSI characters, so you don't have to use UTF-8 characters. * Wherever the parser expects a single character, it will now accept an arbitrary Unicode/UTF-8 glyph as well. In other words, you can call macros like M§ instead of having to write M[§]. You can also get the codepoint of any Unicode character with ^^x. Pressing an Unicode character in the start state or in Ex and Fx will now give a sane error message. * When pressing a key which produces a multi-byte UTF-8 sequence, the character gets translated back and forth multiple times: 1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk). On Curses we could directly get a wide char using wget_wch(), but it's not currently used, so we don't depend on widechar curses. 2. Parsed into gunichar for passing into the edit command callbacks. This also validates the codepoint - everything later on can assume valid codepoints and valid UTF-8 strings. 3. Once the edit command handling decides to insert the key into the command line, it is serialized back into an UTF-8 string as the command line macro has to be in UTF-8 (like all other macros). 4. The parser reads back gunichars without validation for passing into the parser callbacks. * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely inserted and displayed UTF-8 sequences, are now fixed.
2024-09-10	fixed win32 CI and nightly builds (refs #5)	Robin Haberkorn	1	-0/+2
	* The libtool wrapper binaries do not pass down UTF-8 strings correctly, so the Unicode tests failed under some circumstances. * As we aren't actually linking against any locally-built shared libraries, we are passing --disable-shared to libtool which inhibts wrapper generation on win32 and fixes the test suite. * Also use up to date autotools. This didn't fix anything, though. * test suite: try writing an Unicode filename as well * There have been problems doing that on Win32 where UTF-8 was not correctly passed down from the command line and some Windows API calls were only working with ANSI filenames etc.
2024-09-09	try a different value for LC_ALL on Mac OS to accept UTF-8 command lines ↵	Robin Haberkorn	1	-2/+1
	(refs #5)
2024-09-09	testsuite: try different locale on Mac OS (refs #5)	Robin Haberkorn	1	-1/+9
	hopefully fixes the Unicode test cases on Mac OS
2024-09-09	improved 8-bit cleanliness test cases and added Unicode test cases (refs #5)	Robin Haberkorn	2	-5/+27

2024-08-28	fixed retrieval of characters with codes larger than 127 - always return ↵	Robin Haberkorn	1	-0/+3
	unsigned integer * SCI_GETCHARAT is internally casted to `char`, which may be signed. Characters > 127 therefore become negative and stay so when casted to sptr_t. We therefore cast it back to guchar (unsigned char). * The same is true whenever returning a string's character to SciTECO (teco_int_t) as our string type is `gchar `. <^^x> now also works for those characters. Eventually, the parser will probably become UTF8-aware and this will have to be done differently.
2024-08-23	fully support out of tree builds	Robin Haberkorn	1	-0/+1
	* You no longer have to copy contrib/scintilla, contrib/scinterm and contrib/lexilla manually to the build directory. * It turns out, that Scintilla/Lexilla was supporting this since 2016. Scintilla allows pointing to a source directory (srdir) and Lexilla to a binary directory (DIR_O). * For Scinterm I opened a pull request in order to add srcdir/basedir variables: https://github.com/orbitalquark/scinterm/pull/21 * `make distcheck` is therefore now also fixed. * The FreeBSD package is now allowed to build out of source. I haven't tested it yet. * See also https://github.com/ScintillaOrg/lexilla/issues/266
2024-02-08	fixed expressions like `1,(2)` or `(1),(2)`: they are reported as two ↵	Robin Haberkorn	1	-0/+1
	numbers now * Instead of TECO_OP_NEW, there should perhaps simply be a flag of whether `,` was used.
2024-02-06	fixed the power (^*) operator: did not handle corner cases and was inefficient	Robin Haberkorn	1	-0/+9
	* in fact, with a negative exponent the previous naive implementation would even hang indefinitely! * Now uses the squaring algorithm. This is slightly longer but significantly more efficient. * added test cases
2024-01-13	fixed <EC$> assertions: specifying empty command strings was undefined	Robin Haberkorn	1	-0/+5
	* passing an empty command string down to the shell would always do nothing, so it doesn't make sense to support that. * for the time being, we generate a proper error * in the future, it might make sense to define some special behavior like repeating the last command - but EC does not currently save the command line anywhere. * The generated documentation is currently ugly (FIXME). mandatory parameters are not properly detected by tedoc and we cannot keep apart Q-Registers from mandatory parameters either. Also, we should allow <param> markup in command summaries.
2023-07-06	fixed ]$ and ]~ (pop from Q-Reg stack to special Q-Registers)	Robin Haberkorn	1	-0/+6
	* This was setting only the teco_doc but wasn't calling the necessary set_string() methods. * The idiom [$ FG...$ ]$ to change the working directory temporarily now works. * Similarily you can now write [~ ^U~...$ ]~ to change the clipboard temporarily. * Added test suite cases. The clipboard is not tested since it's not supported everywhere and would interfer with the host system. * Resolved lots of redundancies in qreg.c. The clipboard and workingdir Q-Regs have lots in common. This is now abstracted in the "external" Q-Reg base "class" (ie. via initializer TECO_INIT_QREG_EXTERNAL()). It uses vtable calls which is slightly more inefficient than per register implementations, but avoiding redundancies is probably more important.
2023-05-14	FreeBSD: fixed test suite	Robin Haberkorn	1	-2/+3
	* it appears to behave similar to Mac OS with regard to recursions
2022-12-10	fixed pass-through loops: especially :> and :F<	Robin Haberkorn	1	-0/+29
	* fixes test cases like 3<%a:> * you can now use :F< in pass-through loops as well * F> outside of loops will now exit the current macro level. This is analogous to what TECO-11 did. In interactive mode, F> is currently also equivalent to $$ (terminates command line).
2022-12-01	testsuite: added (known bug) testcases for dangling else- and end-if statements	Robin Haberkorn	1	-0/+6
	* This is not easy to fix (show errors when encountering these constructs without preceding if <"> statements) and would require complicating the parser only to detect this. On the other hand, keeping things as they are does not really harm anybody.
2022-11-28	fixed a number of crashes due to empty string arguments or uninitialized ↵	Robin Haberkorn	1	-2/+16
	registers * An empty but valid teco_string_t can contain NULL pointers. More precisely, a state's done_cb() can be invoked with such empty strings in case of empty string arguments. Also a registers get_string() can return the NULL pointer for existing registers with uninitialized string parts. * In all of these cases, the language should treat "uninitialized" strings exactly like empty strings. * Not doing so, resulted in a number of vulnerabilities. * EN$$ crashed if "_" was uninitialized * The ^E@q and ^ENq string building constructs would crash for existing but uninitialized registers q. * ?$ would crash * ESSETILEXER$$ would crash * This is now fixed. Test cases have been added. * I cannot guarantee that I have found all such cases. Generally, it might be wise to change our definitions and make sure that every teco_string_t must have an associated heap object to be valid. All functions returning pointer+length pairs should consequently also never return NULL pointers.
2022-11-20	test suite: temporarily disabled the "Pattern matching overflow" test case	Robin Haberkorn	1	-7/+9
	* This test case no longer fails on MacOS and MinGW builds probably because the settings of the underlying libpcre library changed. * Since these settings are not predictable, cannot be queried and may even change on some flavors of Linux, it has been completely disabled for the time being. * Should fix CI and nightly builds on MacOS and Win32
2021-12-19	safer use of memcpy() and memchr(): we must not pass in NULL pointers	Robin Haberkorn	1	-0/+4
	* The C standard actually forbids this (undefined behaviour) even though it seems intuitive that something like `memcpy(foo, NULL, 0)` does no harm. * It turned out, there were actual real bugs related to this. If memchr() was called with a variable that can be NULL, the compiler could assume that the variable is actually always non-NULL (since glibc declares memchr() with nonnull), consequently eliminating checks for NULL afterwards. The same could theoretically happen with memcpy(). This manifested itself in the empty search crashing when building with -O3. Test case: sciteco -e '@S//' * Consequently, the nightly builds (at least for Ubuntu) also had this bug. * In some cases, the passed in pointers are passed down from the caller but should not be NULL, so I added runtime assertions to guard against it.
2021-10-24	fixed testsuite on Mac OS: skip the recursion overflow test case	Robin Haberkorn	1	-2/+3
	* Turns out we cannot assume that the test case never crashes on Mac OS, so we instead now skip the entire test case on Mac OS. It apparently crashes even on Mac OS when building with --enable-debug (-O0). * Should fix Continous Integration for Mac OS.
2021-10-11	fixed crashes when the Q-Reg stack is non-empty at exit	Robin Haberkorn	1	-0/+4
	* Test case: sciteco -e '[a' [aEX$$ in interactive mode would also crash. * No longer use a destructor - it was executed after the Q-Reg view was destroyed. * Instead, we now explicitly call teco_qreg_stack_clear() in main(). * Added a regression test case.
2021-10-08	Testsuite: standardized the use of square brackets in test case code and ↵	Robin Haberkorn	1	-9/+11
	test escaping of braces in Q-Register specifications
2021-10-08	Fixed testsuite on Mac OS: `echo -n` is apparently not supported on whatever ↵	Robin Haberkorn	1	-4/+7
	$SHELL they execute the testsuite in * instead, we now use `dd`.
2021-06-08	added test suite cases for memory limiting and command execution	Robin Haberkorn	1	-1/+11
	* Turned out to be useful in debugging the "Memory limiting during spawning" test case on Windows. * Use UNIX shell emulation (0,128ED) in all test cases. Should be necessary in order to run the testsuite on Windows, but it is currently broken anyway. * avoid <EG> when preprocessing files - use GNU Make's $(shell) instead * Fixes builds on MinGW where there are still problems with <EC> and <EG> at least in the virtual build environment. * Results in a another automake warning about non-POSIX Make constructs. This is not critical since we depend on GNU Make anyway.