sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-09	define G_DISABLE_ASSERT unless --enable-debug is specified	Robin Haberkorn	1	-1/+1
	* turns out that glib's g_assert() does not depend on NDEBUG like Standard C's assert() * this disables assertions in release builds and should speed up things slightly
2024-09-09	<f,tXq>: fixed for very large character ranges	Robin Haberkorn	1	-3/+7
	* use SCI_GETTEXTRANGEFULL instead of deprecated SCI_GETTEXTRANGE
2024-09-09	symbols-extract.tes works in 8-bit mode now (refs #5)	Robin Haberkorn	2	-3/+3
	* significantly speeds up build time * Scintilla and Lexilla headers and symbols are all-ASCII anyway. * We should probably have a look at the quicksort implementation in string.tes, as it can probably be optimized in UTF-8 documents as well.
2024-09-09	teco_glyphs2bytes() and teco_bytes2glyphs() renamed to ↵	Robin Haberkorn	5	-26/+26
	teco_interface_glyphs2bytes() and teco_interface_bytes2glyphs() (refs #5) * for consistency with all the other teco_view wrappers in interface.h
2024-09-09	added raw ANSI mode to facilitate 8-bit clean editing (refs #5)	Robin Haberkorn	13	-104/+143
	* When enabled with bit 2 in the ED flags (0,4ED), all registers and buffers will get the raw ANSI encoding (as if 0EE had been called on them). You can still manually change the encoding, eg. by calling 65001EE afterwards. * Also the ANSI mode sets up character representations for all bytes >= 0x80. This is currently done only depending on the ED flag, not when setting 0EE. * Since setting 16,4ED for 8-bit clean editing in a macro can be tricky - the default unnamed buffer will still be at UTF-8 and at least a bunch of environment registers as well - we added the command line option `--8bit` (short `-8`) which configures the ED flags very early on. As another advantage you can mung the profile in 8-bit mode as well when using SciTECO as a sort of interactive hex editor. * Disable UTF-8 checks in 8-bit clean mode (sample.teco_ini).
2024-09-09	Xq and ]q inherit the document encoding from the source document (refs #5)	Robin Haberkorn	14	-112/+177
	* ^Uq however always sets an UTF8 register as the source is supposed to be a SciTECO macro which is always UTF-8. * :^Uq preserves the register's encoding * teco_doc_set_string() now also sets the encoding * instead of trying to restore the encoding in teco_doc_undo_set_string(), we now swap out the document in a teco_doc_t and pass it to an undo token. * The get_codepage() Q-Reg method has been removed as the same can now be done with teco_doc_get_string() and the get_string() method.
2024-09-09	n^Uq now checks the input codepoints for validity (refs #5)	Robin Haberkorn	1	-1/+5
	* <nI> and ^EUq do the same
2024-09-09	Gtk: ignore the keyboard layout whereever possible (refs #5)	Robin Haberkorn	2	-23/+89
	* Eg. when typing with a Russian layout, CTRL+I will always insert ^I. * Works with all of the start-state command Ex, Fx, ^x commands and string building constructs. This is exactly where process_edit_cmd_cb() case folds case-insensitive characters. The corresponding state therefore sets an is_case_insensitive flag now. * Does not yet work with anything embedded into Q-Register specifications. This could only be realized with a new state callback (is_case_insensitive()?) that chains to the Q-Register and string building states recursively. * Also it doesn't work with Ё on my Russian phonetic layout, probably because the ANSI caret on that same key is considered dead and not returned by gdk_keyval_to_unicode(). Perhaps we should directly wheck the keyval values? * Whenever a non-ANSI key is pressed in an allowed state, we try to check all other keyvals that could be produced by the same hardware keycode, ie. we check all groups (keyboard layouts).
2024-09-09	leave some comments on what to do when converting the parser to Unicode ↵	Robin Haberkorn	2	-1/+21
	(refs #5)
2024-09-09	search patterns are now expected to be in UTF-8 and the document's encoding ↵	Robin Haberkorn	1	-21/+31
	is taken into account (refs #5) * ^Nx and ^EMx constructs work with Unicode glyphs now, even though the main SciTECO parser is still not Unicode-based. (We translate only complete patterns, although they could have incomplete Unicode sequences at their end.) * case-insensitive searching now works with Unicode glyphs
2024-09-09	the ^EUq string building escape now respects the encoding (can insert bytes ↵	Robin Haberkorn	10	-16/+98
	or codepoints) (refs #5) * This is trickier than it sounds because there isn't one single place to consult. It depends on the context. If the string argument relates to buffer contents - as in <I>, <S>, <FR> etc. - the buffer's encoding is consulted. If it goes into a register (EU), the register's encoding is consulted. Everything else (O, EN, EC, ES...) expects only Unicode codepoints. * This is communicated through a new field teco_machine_stringbuilding_t::codepage which must be set in the states' initial callback. * Seems overkill just for ^EUq, but it can be used for context-sensitive processing of all the other string building constructs as well. * ^V and ^W cannot be supported for Unicode characters for the time being without an Unicode-aware parser
2024-09-09	<I> command evaluates input codepoints (refs #5)	Robin Haberkorn	1	-10/+18

2024-09-09	conditionals now check for Unicode codepoints (refs #5)	Robin Haberkorn	1	-7/+7
	* This will naturally work with both ASCII characters and various non-English scripts. * Unfortunately, it cannot work with the other non-ANSI single-byte codepages. * If we'd like to support scripts working with all sorts of codepoints, we'd have to introduce a new command for translating individual codepoints from the current codepage (as reported by EE) to Unicode.
2024-09-09	glob patterns fully support Unicode now (refs #5)	Robin Haberkorn	1	-13/+16
	* The ASCII compiler would try to escape ("\") all bytes of a multibyte UTF-8 glyph. * The new implementation escapes only metacharacters and passes down all non-ANSI glyphs unchanged. On the downside, this will work only with PCREs.
2024-09-09	:EL can be used to perform codepage conversions now (refs #5)	Robin Haberkorn	2	-35/+231
	* I decoded the Scintilla charset values into codepages, at least those used on Gtk. * make sure that the line character index is not allocated or released too often, as it is actually internally reference counted, which could result in it missing when we really need it. * The line character index still appears to be released whenever the document pointer changes, which will happen after using a different Q-Register. This could be a performance bottleneck (FIXME).
2024-09-09	avoid redunancies between teco_qreg_plain_get_character() and ↵	Robin Haberkorn	6	-48/+54
	teco_state_start_get() (refs #5)
2024-09-09	reserve at most 4 bytes for UTF-8 encoded characters (refs #5)	Robin Haberkorn	3	-3/+4
	There is a widespread myth that they could take up to 6 bytes.
2024-09-09	implemented <EE> and <^E> commands for configuring encodings and translating ↵	Robin Haberkorn	2	-1/+129
	between glyph and byte offsets (refs #5) * ^E is heavily overloaded and can also be used to check whether a given index is valid (as it is the same that most movement commands to internally). Besides that, it is mainly useful for interfacing with Scintilla messages. * EE takes a code page or 0 for ANSI/ASCII. Currently all documents and new registers are UTF-8. There will have to be some kind of codepage inheritance and a single-byte-only mode.
2024-09-09	Unicode support for the Q-Register commands (refs #5)	Robin Haberkorn	10	-145/+274
	* this required adding several Q-Register vtable methods * it should still be investigated whether the repeated calling of SCI_ALLOCATELINECHARACTERINDEX causes any overhead.
2024-09-09	allow Unicode characters in command line arguments (refs #5)	Robin Haberkorn	2	-4/+8
	* the locale must be initialized very early before g_option_context_parse() * will allow UTF-8 characters in the test suite
2024-09-09	Glyph to byte offset mapping is now using the line character index (refs #5)	Robin Haberkorn	7	-68/+130
	* This works reasonably well unless lines are exceedingly long (as on a line we always count characters). The following test case is still slow (on Unicode buffers): 10000<@I/XX/> <%a-1:J;> While the following is now also fast: 10000<@I/X^J/> <%a-1:J;> * Commands with relative character offsets (C, R, A, D) have a special optimization where they always count characters beginning at dot, as long as the argument is now exceedingly large. This means they are fast even on exceedingly long lines. * The remaining commands (search, EC/EG, Xq) now accept glyph indexes.
2024-09-09	implemented Unicode support for rubin/rubout and a number of commands (WIP) ↵	Robin Haberkorn	5	-44/+148
	(refs #5) certain test cases are still way too slow: 10000<@I/X^J/> 20000<R> or 10000<@I/X^J/> 20000<%a-1J> SCI_ALLOCATELINECHARACTERINDEX does not help much here. It probably speeds up only SCI_LINEFROMINDEXPOSITION and SCI_INDEXPOSITIONFROMLINE.
2024-09-09	input and displaying of Unicode characters is now possible (refs #5)	Robin Haberkorn	6	-27/+73
	* All non-ASCII characters are inserted as Unicode. On Curses, this also requires a properly set up locale. * We still do not need any widechar Curses, as waddch() handles multibyte characters on ncurses. We will see whether there is any Curses variant that strictly requires wadd_wch(). If this will be an exception, we might keep both widechar and non-widechar support. * By convention gsize is used exclusively for byte sizes. Character offsets or lengths use int or long.
2024-08-28	fixed retrieval of characters with codes larger than 127 - always return ↵	Robin Haberkorn	2	-5/+7
	unsigned integer * SCI_GETCHARAT is internally casted to `char`, which may be signed. Characters > 127 therefore become negative and stay so when casted to sptr_t. We therefore cast it back to guchar (unsigned char). * The same is true whenever returning a string's character to SciTECO (teco_int_t) as our string type is `gchar `. <^^x> now also works for those characters. Eventually, the parser will probably become UTF8-aware and this will have to be done differently.
2024-08-23	fully support out of tree builds	Robin Haberkorn	1	-3/+2
	* You no longer have to copy contrib/scintilla, contrib/scinterm and contrib/lexilla manually to the build directory. * It turns out, that Scintilla/Lexilla was supporting this since 2016. Scintilla allows pointing to a source directory (srdir) and Lexilla to a binary directory (DIR_O). * For Scinterm I opened a pull request in order to add srcdir/basedir variables: https://github.com/orbitalquark/scinterm/pull/21 * `make distcheck` is therefore now also fixed. * The FreeBSD package is now allowed to build out of source. I haven't tested it yet. * See also https://github.com/ScintillaOrg/lexilla/issues/266
2024-02-08	fixed expressions like `1,(2)` or `(1),(2)`: they are reported as two ↵	Robin Haberkorn	1	-0/+3
	numbers now * Instead of TECO_OP_NEW, there should perhaps simply be a flag of whether `,` was used.
2024-02-06	fixed the power (^*) operator: did not handle corner cases and was inefficient	Robin Haberkorn	1	-1/+22
	* in fact, with a negative exponent the previous naive implementation would even hang indefinitely! * Now uses the squaring algorithm. This is slightly longer but significantly more efficient. * added test cases
2024-02-06	avoid Groff warnings due to `\` escapes	Robin Haberkorn	1	-2/+2
	* It's generally a bad idea to pass backslashes as a glyph in macro arguments, even as `\\` since this could easily be interpreted as an escape. * Instead we now always use `\[rs]`.
2024-02-06	use bool instead of guint for 1-bit fields	Robin Haberkorn	2	-7/+10
	* gboolean cannot be used since it is a signed type * bool is still more readable, even though we mostly use glib typedefs. * AFAIK the glib types are deprecated, so sooner or later we will switch to stdint/stdbool types anyway.
2024-02-03	GTK: allow disabling client-side decorations by setting $GTK_CSD=0	Robin Haberkorn	1	-7/+2
	* This is the same variable used by gtk3-nocsd, but we will now work even without preloading any libraries. Also, it turns out that gtk3-nocsd does not ship as a FreeBSD port and hasn't been updated in a long time. * Setting this in .teco_ini wouldn't have been easy since the teco_interface_init() is called before any TECO code. Also, you might not even want disable this globally but depending on the window manager. * Therefore, you are advised to `export GTK_CSD=0` in ~/.xsession. * The --no-csd command line option is kept for the time being, but probably serves no more purpose.
2024-02-03	Gtk: set icons a bit later after calling gtk_widget_show()	Robin Haberkorn	1	-39/+44
	* Also turns out, I will have to use gtk_window_set_icon_list(). * This fixes icons in tabbed and st (when embedding SciTECO).
2024-01-28	cursor movement via fnkeys.tes now preserves the column as in most text editors	Robin Haberkorn	1	-1/+18
	* Horizontal movements (left/right cursor keys) establish the current column and vertical movements (up/down) will try to keep on that column. * This has long been problematic in SciTECO as it requires state that gets reversed when the command line replacement takes place. * I experimented with encoding the current horizontal position into the braced movement operations as in (123C5U$), but I decided that this was clumsy and I generally did not want these expressions to become even larger. * Instead I decided to add some minimal support to the C core in the form of 4EJ which is like a number register only that it does NOT get reversed on rubout. This is exploited by the fnkeys.tes macros by storing the current position beyond replacements. * In theory, this should be a property of the document, but we cannot easily store custom parameters per document. So instead, there is just one global variable. When editing another buffer, it gets reset to .ESGETCOLUMN$$. sample.teco_ini has been updated. * The current X position only makes sense in the context of fnkeys.tes, as TECO commands like <C> are not necessarily "horizonal" movements. For the same reason, the core does not try to initialize 4EJ automatically when editing new buffers. It's entirely left to the TECO macros. * The commandline replacement is more robust now as it checks braced expressions at the end of the command line more thorougly. It will no longer swallow all preceding braced expressions. Only if they are at least 4 characters in length and end in `C)` or `R)`.
2024-01-21	updated copyright to 2024	Robin Haberkorn	61	-61/+61

2024-01-20	fixed Clang warnings about one-bit-wide boolean integers ↵	Robin Haberkorn	2	-7/+7
	(-Wsingle-bit-bitfield-constant-conversion) * gboolean is defined as gint which is a signed type. A gboolean 1-bit-wide bitfield cannot have the values 0 and 1 but only 0 and -1. * This wasn't practically a bug unless you would try to compare one of those bitfields with TRUE. * All of those bitfields are now guint, even though this is less self-documenting.
2024-01-13	fixed <EC$> assertions: specifying empty command strings was undefined	Robin Haberkorn	1	-19/+18
	* passing an empty command string down to the shell would always do nothing, so it doesn't make sense to support that. * for the time being, we generate a proper error * in the future, it might make sense to define some special behavior like repeating the last command - but EC does not currently save the command line anywhere. * The generated documentation is currently ugly (FIXME). mandatory parameters are not properly detected by tedoc and we cannot keep apart Q-Registers from mandatory parameters either. Also, we should allow <param> markup in command summaries.
2023-07-06	fixed ]$ and ]~ (pop from Q-Reg stack to special Q-Registers)	Robin Haberkorn	1	-164/+84
	* This was setting only the teco_doc but wasn't calling the necessary set_string() methods. * The idiom [$ FG...$ ]$ to change the working directory temporarily now works. * Similarily you can now write [~ ^U~...$ ]~ to change the clipboard temporarily. * Added test suite cases. The clipboard is not tested since it's not supported everywhere and would interfer with the host system. * Resolved lots of redundancies in qreg.c. The clipboard and workingdir Q-Regs have lots in common. This is now abstracted in the "external" Q-Reg base "class" (ie. via initializer TECO_INIT_QREG_EXTERNAL()). It uses vtable calls which is slightly more inefficient than per register implementations, but avoiding redundancies is probably more important.
2023-07-03	introduced TECO_DEBUG_CLEANUP to mark destructors that should only be used ↵	Robin Haberkorn	9	-25/+19
	for debug builds * There is cleanup that is not strictly necessary, because it only frees memory which is freed on program termination anyway. * However, it helps to explicitly free everything for debugging memory leaks via Valgrind. * The new macro reduces the number of #ifdef statements. * On NDEBUG, the code of these functions will still be eliminated. * If functions are referenced only from the destructor, there will be no unused function warnings, even in NDEBUG.
2023-06-19	distribute sciteco.desktop	Robin Haberkorn	2	-2/+10
	* Useful for packaging on platforms where we can only build from tarballs (FreeBSD) * I don't know whether it's always safe and correct to install this file into $DATADIR/applications, so the file is only distributed but not installed yet. * It contains a hardcoded binary name "gsciteco". This could actually differ depending on the concrete --program-prefix and it would be good to include the exact installation path. This however is not possible as long as we do not install this file.
2023-06-19	the SciTECO data installation path is now configurable via --with-scitecodatadir	Robin Haberkorn	1	-1/+1
	* This is also the base of $SCITECOPATH. * Changing it is useful for packaging where it is not possible to factor out the common files between Curses and Gtk builds into a "sciteco-common" package. As an alternative, you can now create disjunct sciteco-curses and sciteco-gtk packages. * You will most likely want to use this for Gtk builds as in: --with-interface=gtk --program-prefix=g --with-scitecodatadir=/usr/local/share/gsciteco.
2023-06-18	fixed caret scrolling on startup	Robin Haberkorn	2	-91/+93
	* Since Scintilla no longer automatically scrolls the caret (see 941f48da6dde691a7800290cc729aaaacd051392), the caret wouldn't always end up in the view on startup. * Added teco_interface_refresh() which includes SCI_SCROLLCARET and is invoked on startup. This helps with the Curses backend. It also reduces code redundancies. * On Gtk, the caret cannot be easily scrolled on startup as long as no size is allocated to the window, so we also added a size-allocate callback to the window's event box. Sizes are less often allocated to the event box than to the window itself for some strange reason.
2023-05-14	resolved warning in gtk-label.c due to wrong enum type	Robin Haberkorn	1	-1/+1
	* This probably did not cause any bugs.
2023-05-14	FreeBSD: fixed the poll-thread memory limiting implementation - it's the ↵	Robin Haberkorn	1	-4/+17
	default now * On FreeBSD both the dlmalloc replacement and poll-thread via sysctl() work but the poll-thread has been benchmarked to be significantly faster, at least on my machine. You can still ./configure --enable-malloc-replacement of course. * Interestingly, the RSS of the process visible via htop does not decrease after OOMs or command-line terminations - with neither of the implementations.
2023-05-12	fixup EC on Win32 and interruptions via CTRL+C	Robin Haberkorn	1	-33/+82
	* This especially fixes spawning on 0,128ED-mode broken since f557af9a9112955d3b65f6ad0d54c0791189f961. * The process is added to a job object now, which allows us to kill the entire process tree. Previously we we were leaving around orphaned processes.
2023-05-09	fixup: building on UNIX	Robin Haberkorn	1	-1/+1

2023-05-09	fixed CTRL+C interruptions on Windows; optimized CTRL+C polling on Gtk+	Robin Haberkorn	5	-88/+125
	* teco_interrupt() turned out to be unsuitable to kill child processes (eg. when <EB> hangs). Instead, we have Win32-specific code now. * Since SIGINT can be ignored on UNIX, pressing CTRL+C was not guaranteed to kill the child process (eg. when <EB> hangs). At the same time, it makes sense to send SIGINT first, so programs can terminate gracefully. The behaviour has therefore been adapted: Interrupting with CTRL+C the first time will kill gracefully. The second time, a more agressive signal is sent to kill the child process. Unfortunately, this would be relatively tricky and complicated to do on Windows, so CTRL+C will always "hard-kill" the child process. * Moreover, teco_interrupt() killed the entire process on Windows when called the second time. This resulted in any interruption to terminate SciTECO unexpectedly when tried the second time on Gtk/Win32. * teco_sigint_occurred renamed to teco_interrupted: There may be several different sources for setting this flag. * Checking for CTRL+C on Gtk involves driving the main event loop repeatedly. This is a very expensive operation. We now do that only every 100ms. This is still sufficient since keyboard input comes from humans. This optimization saves 75% runtime on Windows and 90% on Linux. * The same optimization turned out to be contraproductive on PDCurses/WinGUI.
2023-04-29	fixed <EC> interruptions on Gtk+ (and probably on PDCurses/Win32)	Robin Haberkorn	2	-6/+30
	* test case: ECwhile true; do true; done$ * Some platforms require polling via teco_interface_is_interrupted() for detecting interruptions, so we added an idle watcher to the Glib event loop in spawn.c. * On platforms that do not require polling key presses (like Unix/ncurses), the idle watcher won't do any harm.
2023-04-27	Gtk: fixed scrolling in the command line widget	Robin Haberkorn	1	-7/+2
	* The caret wasn't always kept out of the UZ and at some point would totally leave the view. This was apparently cause by executing two SCI_SCROLLCARETs per teco_interface_cmdline_update(). * Instead, we now use a CARET_EVEN scroll policy which also works sufficiently well.
2023-04-27	Gtk: fixed entering dead keys	Robin Haberkorn	1	-25/+63
	* This is using an Input Method now. * Entering dead keys has probably always been broken in Gtk which I only did not notice because I use a keyboard layout without dead keys. This affects the ^ and ` keys on a German layout. * Once we support Unicode input, it would make sense to abuse Scintilla's already existing input method support. Unfortunately, forwarding keyboard events to the Scintilla view breaks event freezing and results in flickering.
2023-04-20	Curses: do not allow typing any non-ASCII characters - fixes crashes on ↵	Robin Haberkorn	1	-1/+1
	PDCurses/WinGUI * we can neither display, nor parse Unicode characters properly, so this does not worsen anything * makes it harder to confuse the parser as long as we do not support Unicode. * behaves like on Gtk: pressing a non-ASCII char will simply be ignored * Most importantly, this fixes crashes on PDCurses/WinGUI. It apparently couldn't handle the negative integers that resulted from passing a value >= 0x80 <= 0xFF into gchar (which is a signed integer). Changing everything into guchar is not worth the effort - we need full Unicode support anyway.
2023-04-19	fixup: reverted the last Scintilla patch and unref Scintilla objects via ↵	Robin Haberkorn	1	-9/+1
	g_object_unref() * Turns out that using gtk_widget_destroy(), the finalize handler never gets called!? This means we were leaking memory. * Using g_object_unref() fixes that and the initial Scintilla patch is no longer necessary. * There have previously been use-after-free bugs when not using gtk_widget_destroy(). This has apparently been fixed in the meantime in Scintilla.