sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2025-01-19	support :EF for saving a file before closing it	Robin Haberkorn	3	-45/+60
	Analoguous to :EX, but always saves the file like EW$, not only if it's dirty.
2025-01-19	file and directory auto completions can now be case-insensitive	Robin Haberkorn	1	-7/+69
	* This is not simply determined at compile-time but queries the concrete path at least on Windows and OS X. * The Windows implementation is kind of hacky and relies on undocumented behavior. It's also not even tested yet! * On Linux and FreeBSD completions will always be case-sensitive as they used to be. There does not appear to be any API to query case sensitivity of a given path or even the entire file system. At most, we could white-list a number of case-insensitive file systems.
2025-01-13	updated copyright to 2025	Robin Haberkorn	65	-65/+65

2024-12-30	support +line[,column] and filename:line:column syntaxes when opening files	Robin Haberkorn	1	-12/+33
	* This is done via the new opener.tes in the standard library. * Some programs that use $EDITOR expect the +line syntax to work. * You can copy filename:line:column directly from GCC error messages and filename:line from grep output. * Since there may be safe file names beginning with "+" or containing colons, there needs to be a way to turn this off, especially for scripts that don't know anything about the filenames to open. This is done with "--". Unfortunately, the first "--", that stops parameter processing, is always removed from the command line and not passed down into TECO land. This is not a problem for stand-alone scripts, since the script filename is already stopping option processing, so "--" would get passed down. But when calling the profile via `sciteco -- ...`, you could prevent leading minus signs to cause problems but since the `--` is removed, opener.tes cannot use it as a hint. Therefore, we introduced `-S` as a new alternative to `--`, that's always passed down as `--` (i.e. it is equivalent to "-- --"). In other words, `sciteco -S ` will always open exactly the specified files without any danger of misinterpreting certain file names. Should we ever switch to a custom option parsing algorithm, we might preserve "--" (unless after --mung) and thus get rid of "-S". This advanced behavior can be tweaked by the user relatively easily. In the easiest case, we could replace M[opener] with <:L;R 0X.f [* @EB/^EN.f/ ]* L> in ~/.teco_ini to completely disable the special syntax.
2024-12-28	avoid some compiler warnings	Robin Haberkorn	2	-3/+3

2024-12-24	introduced true block and EOL comments	Robin Haberkorn	2	-26/+85
	* The previous convention of !* ... ! are now true block comments, i.e. they are parsed faster, don't spam the goto table and allow embedding of exclamation marks - only "!" terminates the comment. * It is therefore now forbidden to have goto labels beginning with "". Also support "!!" to introduce EOL comments (like C++'s //). This disallows empty labels, but they weren't useful anyway. This is the shortest way to begin a comment. * All comment labels have been converted to true comments, to ensure that syntax highlighting works correctly. EOL comments are used for single line commented-out code, since it's easiest to uncomment - you don't have to jump to the line end. This is a pure convention / coding style. Other people might do it differently. * It's of course still possible to abuse goto labels as comments as TECO did for ages. * In lexing / syntax highlighting, labels and comments are highlighted differently. * When syntax highlighting, a single "!" will first be highlighted as a label since it's not yet unambiguous. Once you type the second character (* or !), the first character is retroactively styled as a comment as well.
2024-12-22	Curses: fixed inserting null-byte (^@) by pressing Ctrl+@	Robin Haberkorn	1	-3/+3
	* g_utf8_get_char_validated() returns -2 for null-bytes (sometimes!?)
2024-12-22	fixed lexing (syntax highlighting) of the null-character (^@) in SciTECO code	Robin Haberkorn	3	-4/+8
	* Apparently g_utf8_get_char_validated() sometimes(!) returns -2 for null-characters, so it was considered an invalid byte sequence. * What's strange and unexplainable is that other uses of the function, as are behind nA and nQq, did not cause problems and returned 0 for null-bytes. * This also fixes syntax higlighting of .teco_session files which use the null-byte as the string terminator. (.teco_session files are not highlighted automatically, though.)
2024-12-22	fixed indention in interface-curses/interface.c	Robin Haberkorn	1	-9/+9
	This is a purely cosmetic change.
2024-12-22	fixed crashes while setting special Q-Registers with EU (string-building ↵	Robin Haberkorn	2	-7/+8
	characters) * The teco_qreg_vtable_t::get_string() method should support returning the length optionally (may be NULL). This already worked with teco_doc_get_string(), even though it wasn't documented, and therefore didn't cause problems with regular Q-Registers.
2024-12-22	support external Scintilla lexer libraries and Scintillua in particular	Robin Haberkorn	2	-8/+109
	* @ES/SCI_SETILEXER/lib^@name/ now opens the lexer <name> in library <lib>. * You need to define the environment variable $SCITECO_SCINTILLUA_LEXERS to point to the lexers/ subdirectory (containing the .lua files). Perhaps this should default to the dirname of <lib>? The semantics of SCI_NAMEOFSTYLE have been changed: It now returns style ids when given style names, so you can actually write Scintillua lexer .tes files. This will be superfluous if we had a way to return strings from Scintilla messages into Q-Registers, e.g. 23@EPq/SCI_NAMEOFSTYLE/. We now depend on gmodule as well, but it should always be part of glib. It does not change the library dependencies of any package. It might result in gmodule shared libraries to be bundled in the Win32 and Mac OS packages if they weren't already.
2024-12-17	sped up opening very large UTF-8 files by temporarily disabling the ↵	Robin Haberkorn	1	-11/+33
	line-character index * checks for character consistency (of UTF-8 byte sequences) were slowing down things significantly in Scintilla * It got even worse if the file indeed contained non-ANSI codepoints as reading in chunks of 1024 would sometimes mean that incomplete byte sequences would be read. Some large 160mb test files wouldn't load even after minutes. They now load in seconds. * This does NOT yet solve the slowdowns when operating on very long lines.
2024-12-14	fixed ^Y after FK...$: take the deleted text into account	Robin Haberkorn	1	-2/+8
	* I.e. you can now write FK...$^YD to delete up to AND the matched pattern.
2024-12-13	SciTECO lexing: fixed styling of commands after dollar or escape (when used ↵	Robin Haberkorn	1	-1/+2
	as a separate command)
2024-12-13	fixup 244a54a18b7db6af177c9d10f3224772f08d7484: abuse the Scintilla view's ↵	Robin Haberkorn	2	-10/+12
	"identifier" to enable lexing in the container * SCI_SETILEXER(NULL) is not a reliable way to do that since that's the default for all views. * This was breaking the git.tes lexer for instance and was unnecessarily driving teco_lexer_style() on plain-text documents. * Since we currently do not implement the ILexer5 C++ interface and teco_view_t is just a pointer alias, we are abusing the view's "identifier" instead. This is probably sufficient, as long as there is only one lexer "in the container". Otherwise, there should perhaps be a single C++ class that does nothing but wrapping a callback into an ILexer5 object with a C ABI.
2024-12-13	document the FK...$^SR idiom	Robin Haberkorn	2	-1/+3
	* We don't actually have to negate ^S results after FK. For deleting the matched pattern, you can use ^YD or -^SD.
2024-12-13	implemented Scintilla lexer for SciTECO code, i.e. TECO syntax highlighting	Robin Haberkorn	14	-21/+329
	* this works by embedding the SciTECO parser and driving it always (exclusively) in parse-only mode. * A new teco_state_t::style determines the Scintilla style for any character accepted in the given state. * Therefore, the SciTECO lexer is always 100% exact and corresponds to the current SciTECO grammer - it does not have to be maintained separately. There are a few exceptions and tweaks, though. * The contents of curly-brace escapes (`@^Uq{...}`) are rendered as ordinary code using a separate parser instance. This can be disabled with the lexer.sciteco.macrodef property. Unfortunately, SciTECO does not currently allow setting lexer properties (FIXME). * Labels and comments are currently styled the same. This could change in the future once we introduce real comments. * Lexers are usually implemented in C++, but I did not want to draw in C++. Especially not since we'd have to include parser.h and other SciTECO headers, that really do not want to keep C++-compatible. Instead, the lexer is implemented "in the container". @ES/SCI_SETILEXER/sciteco/ is internally translated to SCI_SETILEXER(NULL) and we get Scintilla notifications when styling the view becomes necessary. This is then centrally forwarded to the teco_lexer_style() which uses the ordinary teco_view_ssm() API for styling. * Once the command line becomes a Scintilla view even on Curses, we can enabled syntax highlighting of the command line macro.
2024-12-10	fixed compiler warnings when building release builds	Robin Haberkorn	1	-2/+2
	* g_assert() apparently does not reference the expression when assertions are disabled in contrast to glibc's assert()
2024-12-08	fixed rubbing out file open with glob patterns	Robin Haberkorn	1	-5/+4
	* This would crash if <EB> opened more than one file, e.g. EB.c$. The reason is that teco_current_doc_undo_edit() must be called before every teco_ring_edit(). Unfortunately, this is not reproduceable with sciteco --no-profile --fake-cmdline '@EB"foo*.txt"{HK}' since the crashes actually happen when printing messages in interactive mode. That's why no test case has been added.
2024-12-08	implemented the ^Q command for converting between line and glyph positions	Robin Haberkorn	1	-0/+65
	* As known from DEC TECO, but extended to convert absolute positions to line numbers as well. :^Q returns the current line. * Especially useful in macros that accept line arguments, as it is much shorter than something like ^E@ES/LINEFROMPOSITION//+Q.l@ES/POSITIONFROMLINE//:^E-. * On the other hand, the fact that ^Q checks the line range means we cannot easily replace lexer.checkheader with something like [:J 0,^Q::S...$ ]: Using SCI_POSITIONFROMLINE still has the advantage that it returns `Z` for out-of-bounds ranges which would be cumbersome to write with the current ^Q. * Perhaps there should be a separate command for converting between absolute lines and positions and :^Q should be repurposed to return a failure boolean for out-of-range values? * fnkeys.tes could be simplified.
2024-12-06	support the ::S anchored search (string comparison) command (and ::FD, ::FR, ↵	Robin Haberkorn	8	-68/+120
	::FS as well) * The colon modifier can now occur 2 times. Specifying `@` more than once or `:` more than twice is an error now. * Commands do not check for excess colon modifiers - almost every command would have to check it. Instead, a double colon will simply behave like a single colon on most commands. * All search commands inherit the anchored semantics, but it's not very useful in some combinations like -::S, ::N or ::FK. That's why the `::` variants are not documented everywhere. * The lexer.checkheader macro could be simplified and should also be faster now, speeding up startup. Eventually this macro can be made superfluous, e.g. by using 1:FB or 0,1^Q::S.
2024-12-04	the <Xq> command now supports the @ modifier for cutting into the register	Robin Haberkorn	4	-9/+61
	* Can be freely combined with the colon-modifier as well. :@Xq cut-appends to register q. * This simply deletes the given buffer range after the copy or append operation as if followed by another <K> command. * This has indeed been a very annoying missing feature, as you often have to retype the range for a K or D command. At the same time, this cannot be reasonably solved with a macro since macros do not accept Q-Register arguments -- so we would have to restrict ourselves to one or a few selected registers. I was also considering to solve this with a special stack operation that duplicates the top values, so that Xq leaves arguments for K, but this couldn't work for cutting lines and would also be longer to type. * It's the first non-string command that accepts @. Others may follow in the future. We're approaching ITS TECO madness levels.
2024-12-04	implemented ^Y/^S commands for receiving pattern match/insertion ranges and ↵	Robin Haberkorn	7	-25/+219
	lengths (refs #27) * Allows storing pattern matches into Q-Registers (^YXq). * You can also refer to subpatterns marked by ^E[...] by passing a number > 0. This is equivalent to \0-9 references in many programming languages. * It's especially useful for supporting TECO's equivalent of structural regular expressions. This will be done with additional macros. * You can also simply back up to the beginning of an insertion or search. So I...$^SC leaves dot at the beginning of the insertion. S...$^SC leaves dot before the found pattern. This has been previously requested by users. * Perhaps there should be ^Y string building characters as well to backreference in search-replacement commands (TODO). This means that the search commands would have to store the matched text itself in teco_range_t structures since FR deletes the matched text before processing the replacement string. It could also be made into a FR/FS-specific construct, so we don't fetch the substrings unnecessarily. * This differs from DEC TECO in always returning the same range even after dot movements, since we are storing start/end byte positions instead of only the length. Also DEC TECO does not support fetching subpattern ranges.
2024-11-25	fixed operator precedence application (fixup ↵	Robin Haberkorn	1	-6/+10
	5597bc72671d0128e6f0dba446c4dc8d47bf37d0) * Using teco_expressions_eval() is wrong since it does not pay attention to precedences. If you have multiple higher precedence operators in a row, as in 2+345, the lower precedence operators would be resolved prematurely. * Instead we now call teco_expressions_calc() repeatedly but only for lower precedence operators on the stack top. This makes sure that as much of the expression as possible is evaluated at any given moment.
2024-11-25	avoid dynamic stack allocation in teco_expressions_brace_return()	Robin Haberkorn	1	-4/+1
	* This is not safe since the size of the stack object comes from the "outside" world, so stack overflows can theoretically be provoked by macros.
2024-11-25	fixed subtle operator precedence bug	Robin Haberkorn	1	-1/+1
	* It was possible to provoke operator right-associativity when placing a high-precedence operator between two low-precedence operators. 1-65-1 evaluated to -28 instead of the expected -30. The reason is that SciTECO relies on operators to be resolved from left-to-right as soon as possible. The higher precedence operator prevents that and pushing the 2nd "-" only evaluated 65. At the end 1-30-1 would be left on the stack. teco_expressions_eval() however evaluates from right-to-left which is wrong in this case. Instead, we now do a full eval on every operator with a lower precedence, making sure that 1-30 is evaluated first.
2024-11-24	added special Q-Register ":" for accessing dot	Robin Haberkorn	4	-0/+55
	* We cannot call it "." since that introduces a local register and we don't want to add an unnecessary syntactic exception. * Allows the idiom [: ... ]: to temporarily move around. Also, you can now write ^E\: without having to store dot in a register first. * In the future we might add an ^E register as well for byte offsets. However, there are much fewer useful applications. * Of course, you can now also write nU: instead of nJ, Q: instead of "." and n%: instead of "nC.". However it's all not really useful.
2024-11-24	minor documentation changes: use typographic quotes instead of "	Robin Haberkorn	2	-3/+3

2024-11-23	string building: ^c (caret+c) does no longer expand to data garbage for ↵	Robin Haberkorn	1	-0/+9
	non-control characters, but to the literal caret, followed by c * For instance `^$` would insert two characters. * The alternative would have been to throw an error.
2024-11-23	disallow setting the radix to values lower than 2	Robin Haberkorn	2	-8/+28
	* This would actually causes crashes when trying to format numbers. * The ^R local register has a custom set_integer() method now, so that the check is performed also when using nU.^X.
2024-11-23	the search mode and current radix are mapped to __local__ Q-Registers ^X and ↵	Robin Haberkorn	9	-44/+144
	^R now (refs #17) * This way the search mode and radix are local to the current macro frame, unless the macro was invoked with :Mq. If colon-modified, you can reproduce the same effect by calling [.^X 0^X ... ].^X * The radix register is cached in the Q-Reg table as an optimization. This could be done with the other "special" registers as well, but at the cost of larger stack frames. * In order to allow constructs like [.^X typed with upcarets, the Q-Register specification syntax has been extended: ^c is the corresponding control code instead of the register "^".
2024-11-23	implemented search mode flag (^X): allow case-sensitive searches (closes #17)	Robin Haberkorn	3	-2/+38
	* Usually you will only want -^X for enabling case sensitive searches and 0^X for case-insensitive searches (which is also the default). * An open question is what happens if the user sets -^X and then calls a macro. The search mode flag should probably be stacked away along with the search-string. This means we'd need a ^X special Q-Reg as well, so you can write [^X[_ 0^X S...$ ]_]^X. Alternatively, the search mode flag should be a property of the macro frame, along with the radix.
2024-11-19	minor documentation fixes	Robin Haberkorn	1	-2/+4
	* also explicitly mention -%q
2024-11-18	fixed some common typos: "ie." and "eg.", "ocur" instead of "occur"	Robin Haberkorn	8	-14/+14

2024-11-10	Win32: fixed Unicode commandlines with newer MinGW runtimes	Robin Haberkorn	2	-1/+18
	* should also fix Win32 nightly builds * Even though we weren't using main's argv, but were using glib API for retrieving the command line in UTF-8, newer MinGW runtimes would fail when converting the Unicode command line into the system codepage would be lossy. * Most people seem to compile in a "manifest" to work around this issue. But this requires newer Windows versions and using some Microsoft tool which isn't even in $PATH. Instead, we now link with -municode and define wmain() instead, even though we still ignore argv. wmain() proabably get's the command line in UTF-16 and we'd have to convert it anyway. * See https://github.com/msys2/MINGW-packages/issues/22462
2024-11-07	if a macro ends without finding a goto label, always throw a 'Label "..." ↵	Robin Haberkorn	1	-7/+7
	not found' error * This is important with gotos in loops as in <@O/x/> where, we would otherwise get a confusing "Unterminated loop" error. * This in particular fixes the error thrown in grosciteco.tes when encountering a new unknown command.
2024-11-07	test suite: fixed failure detection in the commandline-editing test cases	Robin Haberkorn	1	-0/+4
	* The program exit code will usually not signal failures since they are caught earlier. * Therefore, we always have to capture and check stderr.
2024-11-06	fixed the Q-Reg spec machine used for implementing S^EGq$ (match one of ↵	Robin Haberkorn	3	-21/+29
	characters in Q-Register) * It was initialized only once, so it could inherit the wrong local Q-Register table. A test case has been added for this particular bug. * Also, if starting from the profile (batch mode), the state machine could be initialized without undo, which then later cause problems on rubout in interactive mode. For instance, if S^EG[a] fails and you would repeatedly type `]`, the Q-Reg name could grow indefinitely. There were probably other issues as well. Even crashes should have been possible, although I couldn't reproduce them. * Since the state machine is required only for the pattern to regexp translation and is performed anew for every character in interactive mode, we now create a fresh state machine for every call and don't attempt any undo. There might be more efficient ways, like reusing the string building's Q-Reg parser state machine.
2024-11-06	fixed possible crashes during --fake-cmdline	Robin Haberkorn	1	-4/+2
	* A test case has been added, although it might have been accidental that on caused crashes.
2024-11-05	fixup: fixed Windows builds	Robin Haberkorn	1	-1/+1

2024-11-05	fully support relocatable binaries, improving AppImages	Robin Haberkorn	4	-34/+118
	* You can now specify `--with-scitecodatadir` as a relative path, that will be interpreted relative to the binary's location. * Win32 binaries already were relocatable, but this was a Windows-specific hack. Win32 binaries are now built with `--with-scitecodatadir=.` since everything is in a single directory. * Ubuntu packages are now also built `--with-scitecodatadir=../share/sciteco`. This is not crucial for ordinary installations, but is meant for AppImage creation. * Since AppImages are now built from relocatable packages, we no longer need the unionfs-workaround from pkg2appimage. This should fix the strange root contents when autocompleting in AppImage builds. * This might also fix the appimage.github.io CI issues. I assume that because I could reproduce the issue on FreeBSD's Linuxulator in dependence of pkg2appimage's "union"-setting. See https://github.com/AppImage/appimage.github.io/pull/3402 * Determining the binary location actually turned out be hard and very platform-dependant. There are now implementations for Windows (which could also read argv[0]), Linux and generic UNIX (which works on FreeBSD, but I am not sure about the others). I believe this could also be useful on Mac OS to create app bundles, but this needs to be tested - currently the Mac OS binaries are installed into fixed locations and don't use relocation.
2024-11-03	fixed assertions in ^EGq search construct for Q-Registers with uninitialized ↵	Robin Haberkorn	1	-1/+1
	string cells Found thanks to the "infinite monkey" test.
2024-11-03	Added "infinite monkey"-style test (refs #26)	Robin Haberkorn	2	-0/+36
	Supposing that any monkey hitting keys on a typewriter, serving as a hardcopy SciTECO terminal, will sooner or later trigger bugs and crash the application, the new monkey-test.apl script emulates such a monkey. In fact it's a bit more elaborate as the generated macro follows the frequency distribution extracted from the corpus of SciTECO macro files (via monkey-parse.apl). This it is hoped, increases the chance to get into "interesting" parser states. This also adds a new hidden --sandbox argument, but it works only on FreeBSD (via Capsicum) so far. In sandbox mode, we cannot open any file or execute external commands. It is made sure, that SciTECO cannot assert in sandbox mode for scripts that would run without --sandbox, since assertions are the kind of things we would like to detect. SciTECO must be sandboxed during "infinite monkey" tests, so it cannot accidentally do any harm on the system running the tests. All macros in sandbox mode must currently be passed via --eval. Alternatively, we could add a test compilation unit and generate the test data directly in memory via C code. The new scripts are written in GNU APL 1.9 and will probably work only under FreeBSD. These scripts are not meant to be run by everyone.
2024-10-30	fixup: make sure the correct PCs, pointing directly at the command that ↵	Robin Haberkorn	1	-2/+7
	failed, get assigned to error frames
2024-10-30	fixed invalid memory access when executing the F< command (but only when ↵	Robin Haberkorn	3	-6/+6
	jumping to the beginning of the macro) * I am not sure whether this feature is really that useful... * teco_machine_main_t::macro_pc is now pointing to the __next__ character to execute, therefore it's easier to manipulate by flow control commands. Also, it can now be unsigned (gsize) like all other program counters. * Detected thanks to running the testsuite under Valgrind.
2024-10-29	fixed <N> (search all) crashes before invocations of <S> (closes #26)	Robin Haberkorn	1	-0/+7
	* There was some boilerplate code missing in teco_state_search_all_initial(), that is present in teco_state_search_initial(). * Perhaps there should be a common function to avoid redundancies? * This will also fix the initialization of the string argument codepage for <N>.
2024-10-29	PDCurses: filter out bogus double keypresses in combination with CTRL (refs #20)	Robin Haberkorn	1	-0/+12
	* Has been observed on PDCursesMod/WinGUI when pressing CTRL+Shift+6 on an US layout. I would expect code 30 (^^) to be inserted, instead PDCurses reports two keypresses (6^^). The first one is now filtered out since this will not be fixed upstream. See also https://github.com/Bill-Gray/PDCursesMod/issues/323 * Since AltGr on German layouts is reported as CTRL+ALT, we must be careful not to filter those out as well. * This is active on all PDCurses variants - who knows which other platforms will behave similarily. * You still cannot insert code 0 via CTRL+@ since PDCurses doesn't report it, but ncurses does not allow that either. This _could_ be synthesized by evaluating the modifier flags, though.
2024-10-29	teco_interface_cmdline_update() now protects against batch mode (--fake-cmdline)	Robin Haberkorn	1	-0/+7
	* Fixes the test suite on PDcurses/Win32 and therefore CI builds. * Should be necessary on UNIX as well since later on, we would access cmdline_window, which is not yet initialized. I didn't see any errors in Valgrind, though.
2024-10-28	added hidden --fake-cmdline parameter for testing command-line editing	Robin Haberkorn	1	-0/+13
	* Supports all immediate editing commands. Naturally it cannot emulate arbitrary key presses since there is no canonic ASCII-encoding of function keys. Key macros are not consequently also not testable. The --fake-cmdline parameter is instead treated very similar to a key macro expansion. * Most importantly this allows adding test cases for rubout behavior and bugs that are quite common. * Added regression test cases for the last two rubout bugs. * It's not easy to pass control codes in command line arguments in a portable manner, so the test cases will often use { and }. Control codes could be used e.g. by defining variables like RUBOUT=`printf '\b'` and referencing them with ${RUBOUT}.
2024-10-28	fixed rubbing out <:Xq>, <:^Uq> and other append-to-register operations	Robin Haberkorn	4	-31/+29
	* This was a regression introduced in 41ab5cf0289dab60ac1ddc97cf9680ee2468ea6c, which changed the semantics of teco_doc_undo_set_string(). * Removed undo_append_string() Q-Reg virtual method. append_string() now does its own undo token emission, so that we can defer the teco_doc_undo_edit() after the point that the document was initialized. This is important, so that we can configure the default encoding on new registers.