sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2025-08-27	avoid g_prefix_error_literal(), which requires glib 2.70	Robin Haberkorn	1	-2/+2
	This broke builds e.g. on Ubuntu 20.04. Regression was introduced in 51bd183f064d0c0ea5e0184d9f6b6b62e5c01e50.
2025-08-03	added --quiet, --stdin and --stdout for easier integration into UNIX pipelines	Robin Haberkorn	1	-0/+63
	* In principle --stdin and --stdout could have been done in pure TECO code using the <^T> command. Having built-in command-line arguments however has several advantages: * Significantly faster than reading byte-wise with ^T. * Performs EOL normalization unless specifying --8bit of course. * Significantly shortens command-lines. `sciteco -qio` and `sciteco -qi` can be real replacements for sed and awk. * You can even place SciTECO into the middle of a pipeline while editing interactively: foo \| sciteco -qio --no-profile \| bar Unfortunately, this will not currently work when munging the profile as command-line parameters are also transmitted via the unnamed buffer. This should be changed to use special Q-registers (FIXME). * --quiet can help to improve the test suite (TODO). Should probably be the default in TE_CHECK(). * --stdin and --stdout allow to simplify many SciTECO scripts, avoiding temporary files, especially for womenpage generation (TODO). * For processing potentially infinite streams, you will still have to read using ^T.
2025-07-18	make some array declarations real constants	Robin Haberkorn	1	-1/+1
	* `static const char p = "FOO"` is not a true constant since the variable p can still be changed. It has to be declared as `static const char const p = "FOO"`, so that the pointer itself is constant. * In case of string constants, it's easier however to use `static const char p[] = "FOO"`.
2025-07-18	fixed minor memory leaks of per-state data in teco_machine_main_t	Robin Haberkorn	1	-1/+1
	* These were leaked e.g. in case of end-of-macro errors, but also in case of syntax highlighting (teco_lexer_style()). I considered to solve this by overwriting more of the end_of_macro_cb, but it didn't turn out to be trivial always. * Considering that the union in teco_machine_main_t saved only 3 machine words of memory, I decided to sacrifice those for more robust memory management. * teco_machine_qregspec_t cannot be directly embedded into teco_machine_main_t due to recursive dependencies with teco_machine_stringbuilding_t. It could now and should perhaps be allocated only once in teco_machine_main_init(), but it would require more refactoring.
2025-07-13	implemented <ER> command for reading a file into the current buffer	Robin Haberkorn	1	-21/+41
	* This command exists in Video TECO. In Video TECO it also supports reading multiple files with a glob pattern -- we do not support that as I am not convinced of its usefulness. * teco_view_load() has been extended, so it can read into dot without discarding the existing document.
2025-06-01	<nA> and <nQq> now return -1 in case the index n is out of range	Robin Haberkorn	1	-3/+6
	* The old behavior of throwing an error was inherited from Video TECO. * The command is now more similar to TECO-11. * Since -1 is taken, invalid and incomplete UTF-8 byte sequences are now reported as -2/-3. I wasn't really able to provoke -3, though.
2025-04-10	fixed formatting of the smallest possible integer	Robin Haberkorn	1	-0/+1
	* In other words, fixed `-9223372036854775808\` on --with-teco-integer=64 (which is the default). * The reason is that ABS(G_MININT64) == G_MININT64 since -G_MININT64 == G_MININT64. It is therefore important not to call ABS() on arbitrary teco_int_t's.
2025-01-13	updated copyright to 2025	Robin Haberkorn	1	-1/+1

2024-12-22	fixed lexing (syntax highlighting) of the null-character (^@) in SciTECO code	Robin Haberkorn	1	-1/+1
	* Apparently g_utf8_get_char_validated() sometimes(!) returns -2 for null-characters, so it was considered an invalid byte sequence. * What's strange and unexplainable is that other uses of the function, as are behind nA and nQq, did not cause problems and returned 0 for null-bytes. * This also fixes syntax higlighting of .teco_session files which use the null-byte as the string terminator. (.teco_session files are not highlighted automatically, though.)
2024-12-17	sped up opening very large UTF-8 files by temporarily disabling the ↵	Robin Haberkorn	1	-11/+33
	line-character index * checks for character consistency (of UTF-8 byte sequences) were slowing down things significantly in Scintilla * It got even worse if the file indeed contained non-ANSI codepoints as reading in chunks of 1024 would sometimes mean that incomplete byte sequences would be read. Some large 160mb test files wouldn't load even after minutes. They now load in seconds. * This does NOT yet solve the slowdowns when operating on very long lines.
2024-12-13	fixup 244a54a18b7db6af177c9d10f3224772f08d7484: abuse the Scintilla view's ↵	Robin Haberkorn	1	-2/+9
	"identifier" to enable lexing in the container * SCI_SETILEXER(NULL) is not a reliable way to do that since that's the default for all views. * This was breaking the git.tes lexer for instance and was unnecessarily driving teco_lexer_style() on plain-text documents. * Since we currently do not implement the ILexer5 C++ interface and teco_view_t is just a pointer alias, we are abusing the view's "identifier" instead. This is probably sufficient, as long as there is only one lexer "in the container". Otherwise, there should perhaps be a single C++ class that does nothing but wrapping a callback into an ILexer5 object with a C ABI.
2024-12-13	implemented Scintilla lexer for SciTECO code, i.e. TECO syntax highlighting	Robin Haberkorn	1	-0/+13
	* this works by embedding the SciTECO parser and driving it always (exclusively) in parse-only mode. * A new teco_state_t::style determines the Scintilla style for any character accepted in the given state. * Therefore, the SciTECO lexer is always 100% exact and corresponds to the current SciTECO grammer - it does not have to be maintained separately. There are a few exceptions and tweaks, though. * The contents of curly-brace escapes (`@^Uq{...}`) are rendered as ordinary code using a separate parser instance. This can be disabled with the lexer.sciteco.macrodef property. Unfortunately, SciTECO does not currently allow setting lexer properties (FIXME). * Labels and comments are currently styled the same. This could change in the future once we introduce real comments. * Lexers are usually implemented in C++, but I did not want to draw in C++. Especially not since we'd have to include parser.h and other SciTECO headers, that really do not want to keep C++-compatible. Instead, the lexer is implemented "in the container". @ES/SCI_SETILEXER/sciteco/ is internally translated to SCI_SETILEXER(NULL) and we get Scintilla notifications when styling the view becomes necessary. This is then centrally forwarded to the teco_lexer_style() which uses the ordinary teco_view_ssm() API for styling. * Once the command line becomes a Scintilla view even on Curses, we can enabled syntax highlighting of the command line macro.
2024-09-28	fixed compilation on most compilers due to goto beyond g_auto() declaration	Robin Haberkorn	1	-3/+3

2024-09-28	check the memory limit and allow interruptions when loading files	Robin Haberkorn	1	-5/+23
	* Previously you could open files of arbitrary size and the limit would be checked only afterwards. * Many, but not all, cases should now be detected earlier. Since Scintilla allocates lots of memory as part of rendering, you can still run into memory limits even after successfully loading the file. * Loading extremely large files can also be potentially slow. Therefore, it is now possible to interrupt via CTRL+C. Again, if the UI is blocking because of stuff done as part of rendering, you still may not be able to interrupt the "blocking" operation.
2024-09-09	added raw ANSI mode to facilitate 8-bit clean editing (refs #5)	Robin Haberkorn	1	-8/+43
	* When enabled with bit 2 in the ED flags (0,4ED), all registers and buffers will get the raw ANSI encoding (as if 0EE had been called on them). You can still manually change the encoding, eg. by calling 65001EE afterwards. * Also the ANSI mode sets up character representations for all bytes >= 0x80. This is currently done only depending on the ED flag, not when setting 0EE. * Since setting 16,4ED for 8-bit clean editing in a macro can be tricky - the default unnamed buffer will still be at UTF-8 and at least a bunch of environment registers as well - we added the command line option `--8bit` (short `-8`) which configures the ED flags very early on. As another advantage you can mung the profile in 8-bit mode as well when using SciTECO as a sort of interactive hex editor. * Disable UTF-8 checks in 8-bit clean mode (sample.teco_ini).
2024-09-09	avoid redunancies between teco_qreg_plain_get_character() and ↵	Robin Haberkorn	1	-0/+40
	teco_state_start_get() (refs #5)
2024-09-09	Unicode support for the Q-Register commands (refs #5)	Robin Haberkorn	1	-6/+94
	* this required adding several Q-Register vtable methods * it should still be investigated whether the repeated calling of SCI_ALLOCATELINECHARACTERINDEX causes any overhead.
2024-09-09	Glyph to byte offset mapping is now using the line character index (refs #5)	Robin Haberkorn	1	-0/+6
	* This works reasonably well unless lines are exceedingly long (as on a line we always count characters). The following test case is still slow (on Unicode buffers): 10000<@I/XX/> <%a-1:J;> While the following is now also fast: 10000<@I/X^J/> <%a-1:J;> * Commands with relative character offsets (C, R, A, D) have a special optimization where they always count characters beginning at dot, as long as the argument is now exceedingly large. This means they are fast even on exceedingly long lines. * The remaining commands (search, EC/EG, Xq) now accept glyph indexes.
2024-01-21	updated copyright to 2024	Robin Haberkorn	1	-1/+1

2023-04-05	updated copyright to 2023	Robin Haberkorn	1	-1/+1

2023-04-05	default font is now "Monospace" instead of Courier	Robin Haberkorn	1	-1/+1
	* Courier has the quirk that letter sequences like "fi" are turned into ligatures which breaks the monospaced nature of the display. * We assume that "Monospace" is also more portable, although it hasn't yet been tested on Windows. * only relevant for the Gtk UI of course * It might be a good idea to set SCI_STYLESETCHECKMONOSPACED as well (FIXME?)
2022-06-21	updated copyright to 2022 and updated TODO	Robin Haberkorn	1	-1/+1

2021-10-13	improved default selection colors and made them configurable via color.tes	Robin Haberkorn	1	-0/+3
	* NOTE: Selections are currently only used to highlight search results. * The default selection colors were not always visible well with default settings (--no-profile) and they were not uniform across platforms. On Curses, the selection would be reversed, while on Gtk it had a lighter foreground color. They are now always reversed (black on white background). The default styles do not assume any color support - they use only black and white. * Since these defaults cannot possibly work on every color scheme, color.selfore and color.selback has been added to color.tes. All existing color schemes have been updated to configure selections as reversed to the default colors. This especially fixes selection colors on Gtk. * On solarized.tes, the caret style was already distinct from inversed default colors. On terminal.tes, the color of the caret is now bright white, so it stands out from the selection colors. * In Curses, the caret color is currently __not__ applied to the command line where it is continued to be drawn reversed. The command line drawing code is considered deprecated and will eventually be replaced with a Scintilla minibuffer. * In Gtk, we now apply the caret style to the commandline view as well. * Fixed the comment color in solarized.light.
2021-10-11	optimized character representation setting	Robin Haberkorn	1	-2/+8
	* Esp. with the new Scintilla version, the representation setting as part of every SCI_SETDOCPOINTER has turned out to be a performance bottleneck. * The new Scintilla has a custom tweak/patch that disables any automatic representation setting in Scintilla itself. It is now sufficient to initialize the SciTECO-style representations only once in the lifetime of any view.
2021-10-11	upgraded to Scintilla 5.1.3 and Scinterm 3.1	Robin Haberkorn	1	-1/+3
	* Previous Scintilla version was 3.6.4 and Scinterm was 1.7 (with lots of custom patches). All of the patches are now either irrelevant or have been merged upstream. * Since Scintilla 5 requires C++17, this increases the minimum GCC version at least to 5.0. We may actually require even newer versions. * I could not upgrade the scintilla-mirror (which was imported from Mercurial), so the old sciteco-dev branch was renamed to sciteco-dev-pre-v2.0.0, master was deleted and I reimported the entire Scintilla repo using git-remote-hg. This means that scintilla-mirror now contains two entirely separate trees. But it is still possible to clone old SciTECO repos. * The strategy/workflow of maintaining hotfix branches on scintilla-mirror has been changed. Instead of having one sciteco-dev branch that is rebased onto new Scintilla upstream releases and tagging SciTECO releases in scintilla-mirror (to keep the commits referenced), we now create a branch for every Scintilla version we are based on (eg. sciteco-rel-5-1-3). This branch is never rebased or deleted. Therefore, we are guaranteed to be able to clone arbitrary SciTECO repo commits - not only releases. Releases no longer have to be tagged in scintilla-mirror. On the downside, fixup commits may accumulate in these new branches. They can only be squashed once a new branch for a new Scintilla release is created (e.g. by cherry-picking followed by rebase). * Scinterm does no longer have to reside in the Scintilla subdirectory, so we added it as a regular submodule. There are no more recursive submodules. The Scinterm build system has not been improved at all, but we use a trick based on VPATH to build Scinterm in scintilla/bin/. * Scinterm is now in Git and we reference the upstream repo for the time being. We might mirror it and apply the same branching workflow as with Scintilla if necessary. The scinterm-mirror repository still exists but has not been touched. We will also have to rewrite its master branch as it was a non-reproducible Mercurial import. * Scinterm now also comes with patches for Scintilla which we simply applied on our sciteco-rel-5-1-3 branch. * Scintilla 5 outsourced its lexers into the Lexilla project. We added it as yet another submodule. * All submodules have been moved into contrib/. * The Scintilla API for setting lexers has consequently changed. We now have to call SCI_SETILEXER(0, CreateLexer(name)). As I did not want to introduce a separate command for setting lexers, <ES> has been extended to allow setting lexers by name with the SCI_SETILEXER message which effectively replaces SCI_SETLEXERLANGUAGE. * The lexer macros (SCLEX_...) no longer serve any purpose - they weren't used in the SciTECO standard library anyway - and have consequently been removed from symbols-scilexer.c. The style macros from SciLexer.h (SCE_...) are theoretically still useful - even though they are not used by our current color schemes - and have therefore been retained. They can be specified as wParam in <ES>. * <ES> no longer allows symbolic constants for lParam. This never made any sense since all supported symbols were always wParam. * Scinterm supports new native cursor modes. They are not used for the time being and the previous CARETSTYLE_BLOCK_AFTER caret style is configured by default. It makes no sense to enable native cursor modes now since the command line should have a native cursor but is not yet a Scintilla view. * The Scintilla upgrade performed much worse than before, so some optimizations will be necessary.
2021-10-08	fixed hiding savepoint files on Win32	Robin Haberkorn	1	-1/+2
	* This was an ancient bug apparently broken since d503c3b07c2157658f699294c44ad5be244727a5 (year 2014) and was therefore broken even in v0.6.4.
2021-05-30	THE GREAT CEEIFICATION EVENT	Robin Haberkorn	1	-0/+439
	This is a total conversion of SciTECO to plain C (GNU C11). The chance was taken to improve a lot of internal datastructures, fix fundamental bugs and lay the foundations of future features. The GTK user interface is now in an useable state! All changes have been squashed together. The language itself has almost not changed at all, except for: * Detection of string terminators (usually Escape) now takes the string building characters into account. A string is only terminated outside of string building characters. In other words, you can now for instance write I^EQ[Hello$world]$ This removes one of the last bits of shellisms which is out of place in SciTECO where no tokenization/lexing is performed. Consequently, the current termination character can also be escaped using ^Q/^R. This is used by auto completions to make sure that strings are inserted verbatim and without unwanted sideeffects. * All strings can now safely contain null-characters (see also: 8-bit cleanliness). The null-character itself (^@) is not (yet) a valid SciTECO command, though. An incomplete list of changes: * We got rid of the BSD headers for RB trees and lists/queues. The problem with them was that they used a form of metaprogramming only to gain a bit of type safety. It also resulted in less readble code. This was a C++ desease. The new code avoids metaprogramming only to gain type safety. The BSD tree.h has been replaced by rb3ptr by Jens Stimpfle (https://github.com/jstimpfle/rb3ptr). This implementation is also more memory efficient than BSD's. The BSD list.h and queue.h has been replaced with a custom src/list.h. * Fixed crashes, performance issues and compatibility issues with the Gtk 3 User Interface. It is now more or less ready for general use. The GDK lock is no longer used to avoid using deprecated functions. On the downside, the new implementation (driving the Gtk event loop stepwise) is even slower than the old one. A few glitches remain (see TODO), but it is hoped that they will be resolved by the Scintilla update which will be performed soon. * A lot of program units have been split up, so they are shorter and easier to maintain: core-commands.c, qreg-commands.c, goto-commands.c, file-utils.h. * Parser states are simply structs of callbacks now. They still use a kind of polymorphy using a preprocessor trick. TECO_DEFINE_STATE() takes an initializer list that will be merged with the default list of field initializers. To "subclass" states, you can simply define new macros that add initializers to existing macros. * Parsers no longer have a "transitions" table but the input_cb() may use switch-case statements. There are also teco_machine_main_transition_t now which can be used to implement simple transitions. Additionally, you can specify functions to execute during transitions. This largely avoids long switch-case-statements. * Parsers are embeddable/reusable now, at least in parse-only mode. This does not currently bring any advantages but may later be used to write a Scintilla lexer for TECO syntax highlighting. Once parsers are fully embeddable, it will also be possible to run TECO macros in a kind of coroutine which would allow them to process string arguments in real time. * undo.[ch] still uses metaprogramming extensively but via the C preprocessor of course. On the downside, most undo token generators must be initiated explicitly (theoretically we could have used embedded functions / trampolines to instantiate automatically but this has turned out to be dangereous). There is a TECO_DEFINE_UNDO_CALL() to generate closures for arbitrary functions now (ie. to call an arbitrary function at undo-time). This simplified a lot of code and is much shorter than manually pushing undo tokens in many cases. * Instead of the ridiculous C++ Curiously Recurring Template Pattern to achieve static polymorphy for user interface implementations, we now simply declare all functions to implement in interface.h and link in the implementations. This is possible since we no longer hace to define interface subclasses (all state is static variables in the interface's .c files). Headers are now significantly shorter than in C++ since we can often hide more of our "class" implementations. * Memory counting is based on dlmalloc for most platforms now. Unfortunately, there is no malloc implementation that provides an efficient constant-time memory counter that is guaranteed to decrease when freeing memory. But since we use a defined malloc implementation now, malloc_usable_size() can be used safely for tracking memory use. malloc() replacement is very tricky on Windows, so we use a poll thread on Windows. This can also be enabled on other supported platforms using --disable-malloc-replacement. All in all, I'm still not pleased with the state of memory limiting. It is a mess. * Error handling uses GError now. This has the advantage that the GError codes can be reused once we support error catching in the SciTECO language. * Added a few more test suite cases. * Haiku is no longer supported as builds are instable and I did not manage to debug them - quite possibly Haiku bugs were responsible. * Glib v2.44 or later are now required. The GTK UI requires Gtk+ v3.12 or later now. The GtkFlowBox fallback and sciteco-wrapper workaround are no longer required. * We now extensively use the GCC/Clang-specific g_auto feature (automatic deallocations when leaving the current code block). * Updated copyright to 2021. SciTECO has been in continuous development, even though there have been no commits since 2018. * Since these changes are so significant, the target release has been set to v2.0. It is planned that beginning with v3.0, the language will be kept stable.