sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2025-07-26	properly document some functions in expressions.c and simplified code	Robin Haberkorn	1	-5/+1
	* Practically all calls to teco_expressions_args() must be preceded by teco_expressions_eval(). * In code paths where we know that teco_expressions_args() > 0, it is safe to call teco_expressions_pop_num(0) instead of teco_expressions_pop_num_calc(). This is both easier and faster. * teco_expressions_pop_num_calc() is for simple applications where you just want to get a command argument with default (implied) values. Since it includes teco_expressions_eval(), we can avoid superfluous calls. * -EC...$ turned out to be broken and is fixed now. A test case has been added.
2025-04-13	fixed undoing bitfields on Windows	Robin Haberkorn	1	-2/+2
	* It turns out that `bool` (_Bool) in bitfields may cause padding to the next 32-bit word. This was only observed on MinGW. I am not entirely sure why, although the C standard does not guarantee much with regard to bitfield memory layout and there are 64-bit available due to passing anyway. Actually, they could also be layed out in a different order. * I am now consistently using guint instead of `bool` in bitfields to prevent any potential surprises. * The way that guint was aliased with bitfield structs for undoing teco_machine_main_t and teco_machine_qregspec_t flags was therefore insecure. It was not guaranteed that the __flags field really "captures" all of the bit field. Even with `guint v : 1` fields, this was not guaranteed. We would have required a static assertion for robustness. Alternatively, we could have declared a `gsize __flags` variable as well. This __should__ be safe since gsize should always be pointer sized and correspond to the platform's alignment. However, it's also not 100% guaranteed. Using classic ANSI C enums with bit operations to encode multiple fields and flags into a single integer also doesn't look very attractive. * Instead, we now define scalar types with their own teco_undo_push() shortcuts for the bitfield structs. This is in one way simpler and much more robust, but on the other hand complicates access to the flag variables. * It's a good question whether a `struct __attribute__((packed))` bitfield with guint fields would be a reliable replacement for flag enums, that are communicated with the "outside" (TECO) world. I am not going to risk it until GCC gives any guarantees, though. For the time being, bitfields are only used internally where the concrete memory layout (bit positions) is not crucial. * This fixes the test suite and therefore probably CI and nightly builds on Windows. * Test case: Rub out `@I//` or `@Xq` until before the `@`. The parser doesn't know that `@` is still set and allows all sorts of commands where `@` should be forbidden. * It's unknown how long this has been broken on Windows - quite possibly since v2.0.
2025-03-23	the ES command (send Scintilla message) now supports passing both wParam and ↵	Robin Haberkorn	1	-15/+41
	lParam as null-terminated strings * Being able to embed null bytes into the lParam string is practically useless - there aren't any messages where this is useful and where there are no native SciTECO counterparts - so this case is now catched and the null-byte separates wParam from lParam. * wParam can be the empty string, but it is not supported to pass wParam as a string and lParam as the empty string. If the second string argument ends in ^@, lParam is popped from the stack instead. * This is a temporary workaround until we can properly parse the Scintilla.iface and generate more elegant per-message wrappers. * It in particular unlocks the SCI_SETREPRESENTATION and SCI_SETPROPERTY messages. The former allows us to write a special hex-editor macro which sets hexadecimal character representations, while the latter allows you to set lexer properties. * The C-based lexers ("cpp" in Lexilla) can now take preprocessor definitions into account. This is disabled by default, unless you set lexer.c.defines before opening a file. You can also set it interactively and re-set the lexer. For instance: ^U[lexer.c.defines]NDEBUG$ M[lexer.set.c]
2025-03-23	refactored Lexilla/Scintillua support: it's now in teco_create_lexer()	Robin Haberkorn	1	-59/+80

2025-03-21	don't use TECO_DEFINE_UNDO_OBJECT_OWN() for what are essentially scalars	Robin Haberkorn	1	-5/+6
	* The "own" objects are tricky to work with and have special requirements, so try to avoid them. * Also, wrap the push functions in macros like all other scalars. * This is a purely cosmetic change, but avoids some confusion.
2025-02-23	support mouse interaction with popup windows	Robin Haberkorn	1	-2/+6
	* Curses allows scrolling with the scroll wheel at least if mouse support is enabled via ED flags. Gtk always supported that. * Allow clicking on popup entries to fully autocomplete them. Since this behavior - just like auto completions - is parser state-dependant, I introduced a new state method (insert_completion_cb). All the implementations are currently in cmdline.c since there is some overlap with the process_edit_cmd_cb implementations. * Fixed pressing undefined function keys while showing the popup. The popup area is no longer redrawn/replaced with the Scintilla view. Instead, continue to show the popup.
2025-01-13	updated copyright to 2025	Robin Haberkorn	1	-1/+1

2024-12-22	support external Scintilla lexer libraries and Scintillua in particular	Robin Haberkorn	1	-8/+100
	* @ES/SCI_SETILEXER/lib^@name/ now opens the lexer <name> in library <lib>. * You need to define the environment variable $SCITECO_SCINTILLUA_LEXERS to point to the lexers/ subdirectory (containing the .lua files). Perhaps this should default to the dirname of <lib>? The semantics of SCI_NAMEOFSTYLE have been changed: It now returns style ids when given style names, so you can actually write Scintillua lexer .tes files. This will be superfluous if we had a way to return strings from Scintilla messages into Q-Registers, e.g. 23@EPq/SCI_NAMEOFSTYLE/. We now depend on gmodule as well, but it should always be part of glib. It does not change the library dependencies of any package. It might result in gmodule shared libraries to be bundled in the Win32 and Mac OS packages if they weren't already.
2024-12-13	fixup 244a54a18b7db6af177c9d10f3224772f08d7484: abuse the Scintilla view's ↵	Robin Haberkorn	1	-8/+3
	"identifier" to enable lexing in the container * SCI_SETILEXER(NULL) is not a reliable way to do that since that's the default for all views. * This was breaking the git.tes lexer for instance and was unnecessarily driving teco_lexer_style() on plain-text documents. * Since we currently do not implement the ILexer5 C++ interface and teco_view_t is just a pointer alias, we are abusing the view's "identifier" instead. This is probably sufficient, as long as there is only one lexer "in the container". Otherwise, there should perhaps be a single C++ class that does nothing but wrapping a callback into an ILexer5 object with a C ABI.
2024-12-13	implemented Scintilla lexer for SciTECO code, i.e. TECO syntax highlighting	Robin Haberkorn	1	-3/+8
	* this works by embedding the SciTECO parser and driving it always (exclusively) in parse-only mode. * A new teco_state_t::style determines the Scintilla style for any character accepted in the given state. * Therefore, the SciTECO lexer is always 100% exact and corresponds to the current SciTECO grammer - it does not have to be maintained separately. There are a few exceptions and tweaks, though. * The contents of curly-brace escapes (`@^Uq{...}`) are rendered as ordinary code using a separate parser instance. This can be disabled with the lexer.sciteco.macrodef property. Unfortunately, SciTECO does not currently allow setting lexer properties (FIXME). * Labels and comments are currently styled the same. This could change in the future once we introduce real comments. * Lexers are usually implemented in C++, but I did not want to draw in C++. Especially not since we'd have to include parser.h and other SciTECO headers, that really do not want to keep C++-compatible. Instead, the lexer is implemented "in the container". @ES/SCI_SETILEXER/sciteco/ is internally translated to SCI_SETILEXER(NULL) and we get Scintilla notifications when styling the view becomes necessary. This is then centrally forwarded to the teco_lexer_style() which uses the ordinary teco_view_ssm() API for styling. * Once the command line becomes a Scintilla view even on Curses, we can enabled syntax highlighting of the command line macro.
2024-11-18	fixed some common typos: "ie." and "eg.", "ocur" instead of "occur"	Robin Haberkorn	1	-1/+1

2024-09-11	the SciTECO parser is Unicode-based now (refs #5)	Robin Haberkorn	1	-1/+1
	The following rules apply: * All SciTECO macros __must__ be in valid UTF-8, regardless of the the register's configured encoding. This is checked against before execution, so we can use glib's non-validating UTF-8 API afterwards. * Things will inevitably get slower as we have to validate all macros first and convert to gunichar for each and every character passed into the parser. As an optimization, it may make sense to have our own inlineable version of g_utf8_get_char() (TODO). Also, Unicode glyphs in syntactically significant positions may be case-folded - just like ASCII chars were. This is is of course slower than case folding ASCII. The impact of this should be measured and perhaps we should restrict case folding to a-z via teco_ascii_toupper(). * The language itself does not use any non-ANSI characters, so you don't have to use UTF-8 characters. * Wherever the parser expects a single character, it will now accept an arbitrary Unicode/UTF-8 glyph as well. In other words, you can call macros like M§ instead of having to write M[§]. You can also get the codepoint of any Unicode character with ^^x. Pressing an Unicode character in the start state or in Ex and Fx will now give a sane error message. * When pressing a key which produces a multi-byte UTF-8 sequence, the character gets translated back and forth multiple times: 1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk). On Curses we could directly get a wide char using wget_wch(), but it's not currently used, so we don't depend on widechar curses. 2. Parsed into gunichar for passing into the edit command callbacks. This also validates the codepoint - everything later on can assume valid codepoints and valid UTF-8 strings. 3. Once the edit command handling decides to insert the key into the command line, it is serialized back into an UTF-8 string as the command line macro has to be in UTF-8 (like all other macros). 4. The parser reads back gunichars without validation for passing into the parser callbacks. * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely inserted and displayed UTF-8 sequences, are now fixed.
2024-01-21	updated copyright to 2024	Robin Haberkorn	1	-1/+1

2023-04-14	allow disabling Lexilla (Lexer) support by specifying --without-lexilla	Robin Haberkorn	1	-1/+6
	* This does not make sense for most SciTECO builds, but only when you want to optimize for size as the lexers take up 50% of the compressed binary size. Without Lexilla, it should be possible get it compiled in about 500kb. * It can be useful for instance when building for embedded distributions. * When Lexilla is disabled, symbols-scilexer.c is also not generated (we assume that the Lexilla sources are not available and it also doesn't serve any purpose). * Consequently, most of the lexer configuration scripts are also not installed under --without-lexilla.
2023-04-05	updated copyright to 2023	Robin Haberkorn	1	-1/+1

2022-11-28	fixed a number of crashes due to empty string arguments or uninitialized ↵	Robin Haberkorn	1	-2/+3
	registers * An empty but valid teco_string_t can contain NULL pointers. More precisely, a state's done_cb() can be invoked with such empty strings in case of empty string arguments. Also a registers get_string() can return the NULL pointer for existing registers with uninitialized string parts. * In all of these cases, the language should treat "uninitialized" strings exactly like empty strings. * Not doing so, resulted in a number of vulnerabilities. * EN$$ crashed if "_" was uninitialized * The ^E@q and ^ENq string building constructs would crash for existing but uninitialized registers q. * ?$ would crash * ESSETILEXER$$ would crash * This is now fixed. Test cases have been added. * I cannot guarantee that I have found all such cases. Generally, it might be wise to change our definitions and make sure that every teco_string_t must have an associated heap object to be valid. All functions returning pointer+length pairs should consequently also never return NULL pointers.
2022-06-21	updated copyright to 2022 and updated TODO	Robin Haberkorn	1	-1/+1

2021-10-11	upgraded to Scintilla 5.1.3 and Scinterm 3.1	Robin Haberkorn	1	-40/+45
	* Previous Scintilla version was 3.6.4 and Scinterm was 1.7 (with lots of custom patches). All of the patches are now either irrelevant or have been merged upstream. * Since Scintilla 5 requires C++17, this increases the minimum GCC version at least to 5.0. We may actually require even newer versions. * I could not upgrade the scintilla-mirror (which was imported from Mercurial), so the old sciteco-dev branch was renamed to sciteco-dev-pre-v2.0.0, master was deleted and I reimported the entire Scintilla repo using git-remote-hg. This means that scintilla-mirror now contains two entirely separate trees. But it is still possible to clone old SciTECO repos. * The strategy/workflow of maintaining hotfix branches on scintilla-mirror has been changed. Instead of having one sciteco-dev branch that is rebased onto new Scintilla upstream releases and tagging SciTECO releases in scintilla-mirror (to keep the commits referenced), we now create a branch for every Scintilla version we are based on (eg. sciteco-rel-5-1-3). This branch is never rebased or deleted. Therefore, we are guaranteed to be able to clone arbitrary SciTECO repo commits - not only releases. Releases no longer have to be tagged in scintilla-mirror. On the downside, fixup commits may accumulate in these new branches. They can only be squashed once a new branch for a new Scintilla release is created (e.g. by cherry-picking followed by rebase). * Scinterm does no longer have to reside in the Scintilla subdirectory, so we added it as a regular submodule. There are no more recursive submodules. The Scinterm build system has not been improved at all, but we use a trick based on VPATH to build Scinterm in scintilla/bin/. * Scinterm is now in Git and we reference the upstream repo for the time being. We might mirror it and apply the same branching workflow as with Scintilla if necessary. The scinterm-mirror repository still exists but has not been touched. We will also have to rewrite its master branch as it was a non-reproducible Mercurial import. * Scinterm now also comes with patches for Scintilla which we simply applied on our sciteco-rel-5-1-3 branch. * Scintilla 5 outsourced its lexers into the Lexilla project. We added it as yet another submodule. * All submodules have been moved into contrib/. * The Scintilla API for setting lexers has consequently changed. We now have to call SCI_SETILEXER(0, CreateLexer(name)). As I did not want to introduce a separate command for setting lexers, <ES> has been extended to allow setting lexers by name with the SCI_SETILEXER message which effectively replaces SCI_SETLEXERLANGUAGE. * The lexer macros (SCLEX_...) no longer serve any purpose - they weren't used in the SciTECO standard library anyway - and have consequently been removed from symbols-scilexer.c. The style macros from SciLexer.h (SCE_...) are theoretically still useful - even though they are not used by our current color schemes - and have therefore been retained. They can be specified as wParam in <ES>. * <ES> no longer allows symbolic constants for lParam. This never made any sense since all supported symbols were always wParam. * Scinterm supports new native cursor modes. They are not used for the time being and the previous CARETSTYLE_BLOCK_AFTER caret style is configured by default. It makes no sense to enable native cursor modes now since the command line should have a native cursor but is not yet a Scintilla view. * The Scintilla upgrade performed much worse than before, so some optimizations will be necessary.
2021-06-02	renamed scintilla.[ch] to symbols.[ch]: fixes builds on case-insensitive ↵	Robin Haberkorn	1	-0/+349
	file systems * There is a "Scintilla.h" as well. * should fix macOS and builds on native Windows hosts * It wasn't practical to refer to the Scintilla includes using paths since the Scintilla location is configurable (--with-scintilla). So we'd have to write something like #include <include/Scintilla.h>. For Scinterm we cannot avoid collisions neither as its path is also configurable (--with-scinterm). Effectively, we must prevent name clashes across SciTECO and all of Scintilla and Scinterm.