sciteco - Scintilla-based Text Editor and COrrector

Age	Commit message (Collapse)	Author	Files	Lines
2025-07-19	special Q-registers `$` (working directory) and the clipboard registers now ↵	Robin Haberkorn	1	-1/+0
	support the append operation (:Xq, :^Uq...) Works via a default implementation in the "external" Q-register "class" by first querying the string, appending and re-setting it.
2025-07-16	the primary clipboard is now chosen by the 10th bit in the ED flags	Robin Haberkorn	1	-8/+6
	* `[q]~` was broken and resulted in crashes since it reset the clipboard character to 0. In fact, if we don't want to break the `[a]b` idiom we cannot use the numeric cell of register `~`. * Therefore we no longer use the numeric part of register `~`. Once the clipboard registers are initialized they completely replace any existing register with the same name that may have been set in the profile. So we still don't leak any memory. (But perhaps it would now be better to fail with an error if one of the clipboard registers already exist?) * Instead, bit 10 (1024) of ED is now used to change the default clipboard to the primary selection. The alternative might have been an EJ flag or even a special register containing the name of the default clipboard register. * partially reverses 8c6de6cc718debf44f6056a4c34c4fbb13bc5020
2025-07-13	allow changing the default clipboard by setting the `~` integer	Robin Haberkorn	1	-2/+9
	* It continues to default to 67 (C), which is the system clipboard. But you can now overwrite it e.g. by adding `^^PU~` to the profile. * This fixes a minor memory leak: If you set one of the clipboard registers in the profile (initializing them as plain registers), the clipboard register had been leaked. The clipboard registers now replace any existing register, while at the same time preserving the numeric part. * All remaining Q-Reg table insertions use a new function teco_qreg_table_insert_unique() which adds an assertion, so that we notice any future possible memory leaks.
2025-07-03	implemented ^E<code> string building constructs for embedding bytes and ↵	Robin Haberkorn	1	-0/+13
	codepoints in a strtoul()-like manner
2025-05-24	new string building construct ^P disables all further string building magic	Robin Haberkorn	1	-0/+6
	* Now, `I^P` can replace `EI`. EI is therefore now free to be repurposed as the new "mung file" command for improved TECO-11 compatibility. * On the downside when inserting large blocks of TECO code, you will have to write something like `@I{^P !...! }` * The construct is also useful when searching for carets as in `S^P^Q^`.
2025-05-18	sciteco(7): added a help topic for booleans	Robin Haberkorn	1	-0/+1
	So you can lookup `?bool$` for instance.
2025-04-10	testsuite: check whether comparisons for equality really work with the ↵	Robin Haberkorn	1	-12/+16
	`a-b"=` idiom * There might theoretically be problems with the uncommon one's complement or magnitude representation of negative integers, but it's practically impossible to meet those in the wild. * Still, we do some checks now, so we will at least notice any exotic architectures. * Also, documented the `a^#b"=` idiom for checking for equality. It's longer to type, but faster and will also work for floats. For floats it will be the only permissible idiom for checking for bitwise equality as `a-b` can be 0 even if a!=b (if the difference is very small). Changing the `-` semantics is out of the question.
2025-04-09	tightened rules for specifying modifiers	Robin Haberkorn	1	-3/+5
	* Instead of separate stand-alone commands, they are now allowed only immediately in front of the commands that accept them. * The order is still insignificant if both `@` and `:` are accepted. * The number of colon modifiers is now also checked. We basically get this for free. * `@` has syntactic significance, so it could not be set conditionally anyway. Still, it was possible to provoke bugs were `@` was interpreted conditionally as in `@ 2<I/foo/$>`. * Even when not causing bugs, a mistyped `@` would often influence the __next__ command, causing unexpected behavior, for instance when typing `@(233C)W`. * While it was theoretically possible to set `:` conditionally, it could also be "passed through" accidentally to some command where it wasn't expected as in `:Ifoo$ C`. I do not know of any real useful application or idiom of a conditionally set `:`. If there would happen to be some kind of useful application, `:'` and `:\|` could be re-allowed easily, though. * I was condidering introducing a common parser state for modified commands, but that would have been tricky and introduce a lot of redundant command lists. So instead, we now simply everywhere check for excess modifiers. To simplify this task, teco_machine_main_transition_t now contains flags signaling whether the transition is allowed with `@` or `:` modifiers set. It currently only has to be checked in the start state, after `E` and `F`.
2025-04-08	improved rubbing out commands with modifiers	Robin Haberkorn	1	-0/+2
	* This was actually broken if the command is preceded by `@` and `:` characters, which are __not__ modifiers. E.g. `Q:@I/foo^W` would have rubbed out the `:` register as well. * Also, since it was all done in teco_state_process_edit_cmd(), it would also rub out modifier characters from within string arguments, E.g. `@I/::^EQ^W` * Real commands now have their own ^W rubout implementation, while the generic fallback just rubs out until the start state is re-established. This fails to rub out modifiers as in `@I/^W`, though. * Real command characters now use the common TECO_DEFINE_STATE_COMMAND(). * Added test cases for CTRL+W rub out. A few control characters are now portably available to tests via environment variables `$ESCAPE`, `$RUBOUT` and `$RUBOUT_WORD`.
2025-04-04	scroll caret __almost__ always automatically after key presses	Robin Haberkorn	1	-0/+4
	* The old heuristics - scroll if dot changes after key press - turned out to be too simplistic. They broke the clang-format macro (M#cf), which left the view at the top of the document since the entire document is temporarily erased. Other simplified examples of this bug would be: @^Um{[: HECcat$ ]:} Mm Or even: @^Um{[: H@X.aG.a ]:} Mm * Actually, the heuristics could be tricked even without deleting any significant amount of text from the buffer. The following test case replaces the previous character with a linefeed in a single key press: @^Um{-DI^J$} Mm If executed on the last visible line, dot wouldn't be scrolled into the view since it did not change. * At the same time, we'd like to keep the existing mouse scroll behavior from fnkeys.tes, which is allowed to scroll dot outside of the visible area. Therefore, dot is scrolled into view always, except after mouse events. You may have to call SCI_SCROLLCARET manually in the ^KMOUSE macro, which is arguably not always straight forward. * Some macros like M#cf may still leave the vertical scrolling position in unexpected positions. This could either be fixed by eradicating all remaining automatic scrolling from Scintilla or by explicitly restoring the vertical position from the macro (FIXME). * This was broken since the introduction of mouse support, so it wasn't in v2.3.0.
2025-03-29	^W also rubs out/in `@` and `:` modifiers	Robin Haberkorn	1	-0/+3
	* It makes little sense to e.g. rub out until `I` in `@I/foo/`, but leave the `@` modifier. Modifiers have to be considered part of the command, even though the state machine is not currently modelled like that.
2025-03-23	sciteco(7): fixed formatting of some tables	Robin Haberkorn	1	-4/+4
	This was changed ages ago for some old version of Groff. These workarounds should no longer be necessary.
2025-03-23	the ^W immediate editing command now mimics `Y` more closely and also rubs ↵	Robin Haberkorn	1	-5/+3
	out no-op commands (whitespace) * In string arguments, ^W first rubs out non-word chars (usually whitespace), so it makes sense if ^W would work analogously at the command level. A non-command would be one of the no-ops.
2025-03-22	harmonized all word-movement and deletion commands: they move/delete until ↵	Robin Haberkorn	1	-2/+3
	the beginning of words now * All commands and their documentations were inconsistent. * ^W rubbed out to the beginning of words. * Shift+Right (fnkeys.tes) moved to the beginning of the next word if invoked at the beginning of a word and to the end of the next word otherwise. * <W> (and <V> and <Y> by extension) moved to the end of the next word. * The cheat sheet would claim that <W> moves to the beginning of the next word. * Video TECO's <W> command would differ again from everything else. With positive arguments, it moved to the beginning of words, while with negative it moved to end of words. I decided not to copy this behavior. * It has been decided to adopt a consistent beginning-of-words policy. -W therefore differs from Video TECO in moving to the beginning of the current or previous word. * teco_find_words() is now based on parsing the document pointer, instead of relying on SCI_WORDENDPOSITION, since the latter cannot actually be used to skip strictly non-word characters. This requires a constant amount of Scintilla messages but will require fewer messages only when moving for more than 3 words. * The semantics of <W> are therefore now consistent with Vim and Emacs as well. * Shift+Right/Left is still based on SCI_WORDENDPOSITION, so it's behavior differs slightly from <W> for instance at the end of lines, as it will stop at linebreaks. * Unfortunately, these changes will break lots of macros, among others the M#rf, M#sp and git.blame macros ("Useful macros" from the wiki).
2025-02-27	implemented ncurses clipboard support via external processes	Robin Haberkorn	1	-7/+14
	* As an alternative to OSC-52, which is rarely supported by terminal emulators. * Makes the new mouse support much more useful since you rely on good builtin clipboard support. You can no longer e.g. just double-click a word to copy it into the "primary" selection as terminal emulators do by default. * Set $SCITECO_CLIPBOARD_SET/GET e.g. to xclip, way-copy, pbcopy or some wrapper script. * This is currently using POSIX-specific popen() API, so it behaves a bit different to command execution via EC/EG. I am not sure if it's worth rewriting with the GSpawn-API, since it will be used only on POSIX anyway and a GSpawn-based implementation is likely to be a bit larger. * Should there be some small command-line utility for interacting (esp. pasting) via OSC-52, built-in OSC-52 support could well be removed from SciTECO. Currently, I know only of https://github.com/theimpostor/osc/ and it requires very recent Go compilers. (I still haven't tested it. Quite possibly, pasting when run as a piped command is impossible.)
2025-02-23	support mouse interaction with popup windows	Robin Haberkorn	1	-2/+20
	* Curses allows scrolling with the scroll wheel at least if mouse support is enabled via ED flags. Gtk always supported that. * Allow clicking on popup entries to fully autocomplete them. Since this behavior - just like auto completions - is parser state-dependant, I introduced a new state method (insert_completion_cb). All the implementations are currently in cmdline.c since there is some overlap with the process_edit_cmd_cb implementations. * Fixed pressing undefined function keys while showing the popup. The popup area is no longer redrawn/replaced with the Scintilla view. Instead, continue to show the popup.
2025-02-16	implemented mouse support via special ^KMOUSE and <EJ> with negative keys	Robin Haberkorn	1	-0/+9
	* You need to set 0,64ED to enable mouse processing in Curses. It is always enabled in Gtk as it should never make the experience worse. sample.teco_ini enables mouse support, since this should be the new default. `sciteco --no-profile` won't have it enabled, though. * On curses, it requires the ncurses mouse protocol version 2, which will also be supported by PDCurses. * Similar to the Curses API, a special key macro ^KMOUSE is inserted if any of the supported mouse events has been detected. * You can then use -EJ to get the type of mouse event, which can be used with a computed goto in the command-line editing macro. Alternatively, this could have been solved with separate ^KMOUSE:PRESSED, ^KMOUSE:RELEASED etc. pseudo-key macros. * The default ^KMOUSE implementation in fnkeys.tes supports the following: * Left click: Edit command line to jump to position. * Ctrl+left click: Jump to beginning of line. * Right click: Insert position or position range (when dragging). * Double right click: insert range for word under cursor * Ctrl+right click: Insert beginning of line * Scroll wheel: scrolls (faster with shift) * Ctrl+scroll wheel: zoom (GTK-only) * Currently, there is no visual feedback when "selecting" ranges via right-click+drag. This would be tricky to do and most terminal emulators do not appear to support continuous mouse updates.
2025-02-15	redefining labels is a warning now	Robin Haberkorn	1	-1/+6
	* Allowing label redefinitions might have been useful when used as comments, since you will want to be able to define arbitrary comments. However as flow control constructs, this introduced a certain ambiguity since gotos might jump to different locations, depending on the progression of the parser. * On the other hand, making label redefinition an error might disqualify labels as comments when writing or porting classic TECO code. Therefore, it has been made a warning as a compromise. * Added test case
2024-12-24	introduced true block and EOL comments	Robin Haberkorn	1	-3/+31
	* The previous convention of !* ... ! are now true block comments, i.e. they are parsed faster, don't spam the goto table and allow embedding of exclamation marks - only "!" terminates the comment. * It is therefore now forbidden to have goto labels beginning with "". Also support "!!" to introduce EOL comments (like C++'s //). This disallows empty labels, but they weren't useful anyway. This is the shortest way to begin a comment. * All comment labels have been converted to true comments, to ensure that syntax highlighting works correctly. EOL comments are used for single line commented-out code, since it's easiest to uncomment - you don't have to jump to the line end. This is a pure convention / coding style. Other people might do it differently. * It's of course still possible to abuse goto labels as comments as TECO did for ages. * In lexing / syntax highlighting, labels and comments are highlighted differently. * When syntax highlighting, a single "!" will first be highlighted as a label since it's not yet unambiguous. Once you type the second character (* or !), the first character is retroactively styled as a comment as well.
2024-12-22	support external Scintilla lexer libraries and Scintillua in particular	Robin Haberkorn	1	-3/+8
	* @ES/SCI_SETILEXER/lib^@name/ now opens the lexer <name> in library <lib>. * You need to define the environment variable $SCITECO_SCINTILLUA_LEXERS to point to the lexers/ subdirectory (containing the .lua files). Perhaps this should default to the dirname of <lib>? The semantics of SCI_NAMEOFSTYLE have been changed: It now returns style ids when given style names, so you can actually write Scintillua lexer .tes files. This will be superfluous if we had a way to return strings from Scintilla messages into Q-Registers, e.g. 23@EPq/SCI_NAMEOFSTYLE/. We now depend on gmodule as well, but it should always be part of glib. It does not change the library dependencies of any package. It might result in gmodule shared libraries to be bundled in the Win32 and Mac OS packages if they weren't already.
2024-12-06	support the ::S anchored search (string comparison) command (and ::FD, ::FR, ↵	Robin Haberkorn	1	-0/+5
	::FS as well) * The colon modifier can now occur 2 times. Specifying `@` more than once or `:` more than twice is an error now. * Commands do not check for excess colon modifiers - almost every command would have to check it. Instead, a double colon will simply behave like a single colon on most commands. * All search commands inherit the anchored semantics, but it's not very useful in some combinations like -::S, ::N or ::FK. That's why the `::` variants are not documented everywhere. * The lexer.checkheader macro could be simplified and should also be faster now, speeding up startup. Eventually this macro can be made superfluous, e.g. by using 1:FB or 0,1^Q::S.
2024-12-04	the <Xq> command now supports the @ modifier for cutting into the register	Robin Haberkorn	1	-3/+13
	* Can be freely combined with the colon-modifier as well. :@Xq cut-appends to register q. * This simply deletes the given buffer range after the copy or append operation as if followed by another <K> command. * This has indeed been a very annoying missing feature, as you often have to retype the range for a K or D command. At the same time, this cannot be reasonably solved with a macro since macros do not accept Q-Register arguments -- so we would have to restrict ourselves to one or a few selected registers. I was also considering to solve this with a special stack operation that duplicates the top values, so that Xq leaves arguments for K, but this couldn't work for cutting lines and would also be longer to type. * It's the first non-string command that accepts @. Others may follow in the future. We're approaching ITS TECO madness levels.
2024-11-30	sciteco(7): fixed outdated information about the STYLE_CALLTIP default colors	Robin Haberkorn	1	-3/+3

2024-11-24	sciteco(7): minor documentation fix	Robin Haberkorn	1	-1/+1

2024-11-24	added special Q-Register ":" for accessing dot	Robin Haberkorn	1	-0/+13
	* We cannot call it "." since that introduces a local register and we don't want to add an unnecessary syntactic exception. * Allows the idiom [: ... ]: to temporarily move around. Also, you can now write ^E\: without having to store dot in a register first. * In the future we might add an ^E register as well for byte offsets. However, there are much fewer useful applications. * Of course, you can now also write nU: instead of nJ, Q: instead of "." and n%: instead of "nC.". However it's all not really useful.
2024-11-23	the search mode and current radix are mapped to __local__ Q-Registers ^X and ↵	Robin Haberkorn	1	-1/+28
	^R now (refs #17) * This way the search mode and radix are local to the current macro frame, unless the macro was invoked with :Mq. If colon-modified, you can reproduce the same effect by calling [.^X 0^X ... ].^X * The radix register is cached in the Q-Reg table as an optimization. This could be done with the other "special" registers as well, but at the cost of larger stack frames. * In order to allow constructs like [.^X typed with upcarets, the Q-Register specification syntax has been extended: ^c is the corresponding control code instead of the register "^".
2024-11-19	minor documentation fixes	Robin Haberkorn	1	-1/+1
	* also explicitly mention -%q
2024-11-18	fixed some common typos: "ie." and "eg.", "ocur" instead of "occur"	Robin Haberkorn	1	-4/+4

2024-10-04	pattern match characters support ^Q/^R now as well	Robin Haberkorn	1	-0/+8
	* makes it possible, albeit cumbersome, to escape pattern match characters * For instance, to search for ^Q, you now have to type S^Q^Q^Q^Q$. To search for ^E you have to type S^Q^Q^Q^E$. But the last character cannot be typed with carets currently (FIXME?). For pattern-only characters, two ^Q should be sufficient as in S^Q^Q^X$. * Perhaps it would be more elegant to abolish the difference between string building and pattern matching characters to avoid double quoting. But then all string building constructs like ^EQq should operate at the pattern level as well (ie. match the contents of register q verbatim instead of being interpreted as a pattern). TECOC and TECO-64 don't do that either. If we leave everything as it is, at least a new string building construct should be added for auto-quoting patterns (analoguous to ^EN and ^E@).
2024-09-25	inhibit some immediate editing commands after ^Q/^R string building constructs	Robin Haberkorn	1	-0/+3
	* This allows you to type ^Q^U (which would otherwise rub out the entire argument) and ^Q^W (which would otherwise rub out the ^Q). * ^Q^U coincidentally worked previously since the teco_state_stringbuilding_escaped state would default to teco_state_process_edit_cmd(). But it's better to make this feauture explicit. * This finally makes it possible to insert the ^W (23) char into a buffer. In interactive mode, you can still only type Caret+W as a string building construct. * ^G could also be inhibited after ^Q, but the control char is not used anywhere yet, so there is no point in doing that.
2024-09-23	allow OSC-52 clipboards on all terminal emulators	Robin Haberkorn	1	-7/+18
	* The XTerm version is still checked if we detect running under XTerm. * Actually, the XTerm implementation is broken for Unicode clipboard contents. * Kitty supports OSC-52, but you __must__ enable read-clipboard. With read-clipboard-ask, there will be a timeout. But we cannot read without a timeout since otherwise we would hang indefinitely if the escape sequence turns out to not work. * For urxvt, I have hacked an existing extension: https://gist.github.com/rhaberkorn/d7406420b69841ebbcab97548e38b37d * st currently supports only setting the clipboard, but not querying it.
2024-09-20	^W^W and ^V^V can be typed completely with upcarets now and they case fold ↵	Robin Haberkorn	1	-1/+5
	all expansions of ^EQq, ^EUq and so on * Previously, there was no way to enter upper-case mode in interactive commands since the Ctrl+W immediate editing command is interpreted everywhere. * Without the case folding of ^EQq/^EUq results, the upper and lower case modes are actually pretty useless considering that modern keyboards have caps lock. So it was clear we need this, regardless of what the classic TECOs did. The TECO-11 manual is not very clear on this. tecoc apparently does not case-fold ^EQq results. * This opens up new idioms, for instance `EUq^W^W^EQq$` in order to upper case register q. It's also the only way you can currently upper-case Unicode codepoints.
2024-09-19	Ctrl+^ is no longer translated to a single caret in string building (refs #20)	Robin Haberkorn	1	-1/+3
	* Ctrl+^ (30) and Caret+caret (^^) were both translated to a single caret. While there might be some reason to keep this behavior for double-caret, it is certainly pointless for Ctrl+^. * That gives you an easy way to insert Ctrl+^ (code 30) into documents with <I>. Perviously, you either had to insert a double-caret, typing 4 carets in a row, or you had to use <EI> or 30I$. * The special handling of double-caret could perhaps be abolished altogether, as we also have ^Q^ to escape plain carets. The double-caret syntax is very archaic from the time that there was no proper ^Q as far as I recall correctly.
2024-09-17	sciteco(7): mentioned "[a]b" idiom	Robin Haberkorn	1	-1/+2

2024-09-16	updated lists of external links in sciteco(1) and sciteco(7)	Robin Haberkorn	1	-8/+6
	* Unfortunately, the list in sciteco(7) does not format with FreeBSD's man or within SciTECO. * Removed references to the old sciteco.sf.net. We don't have a proper "homepage" for the time being.
2024-09-16	Curses: added support for cool Unicode icons (refs #5)	Robin Haberkorn	1	-0/+6
	* Practically requires one of the "Nerd Font" fonts, so it's disabled by default. Add 0,512ED to the profile to enable them. * The new ED flag could be used to control Gtk icons as well, but they are left always-enabled for the time being. Is there any reason anybody would like to disable icons in Gtk? * The list of icons has been adapted and extended from exa: https://github.com/ogham/exa/blob/master/src/output/icons.rs * The icons are hardcoded as presorted lists, so we can binary search them. This could change in the future. If there is any demand, they could be made configurable via Q-Registers as well.
2024-09-12	function key macros have been reworked into a more generic key macro feature	Robin Haberkorn	1	-99/+133
	* ALL keypresses (the UTF-8 sequences resulting from key presses) can now be remapped. * This is especially useful with Unicode support, as you might want to alias international characters to their corresponding latin form in the start state, so you don't have to change keyboard layouts so often. This is done automatically in Gtk, where we have hardware key press information, but has to be done with key macros in Curses. There is a new key mask 4 (bit 3) for that purpose now. * Also, you might want to define non-ANSI letters to perform special functions in the start state where it won't be accepted by the parser anyway. Suppose you have a macro M→, you could define @^U[^K→]{m→} 1^_U[^K→] This effectively "extends" the parser and allow you to call macro "→" by a single key press. See also #5. * The register prefix has been changed from ^F (for function) to ^K (for key). This is the only thing you have to change in order to migrate existing function key macros. * Key macros are enabled by default. There is no longer any way to disable function key handling in curses, as I never found any reason or need to disable it. Theoretically, the default ESCDELAY could turn out to be too small and function keys don't get through. I doubt that's possible unless on extremely slow serial lines. Even then, you'd have to increase ESCDELAY and instead of disabling function keys simply define an escape surrogate. * The ED flag has been removed and its place is reserved for a future mouse support flag (which does make sense to disable in curses sometimes). fnkeys.tes is consequently also enabled by default in sample.teco_ini. * Key macros are handled as an unit. If one character results in an error, the entire string is rubbed out. This fixes the "CLOSE" key on Gtk. It also makes sure that the original error message is preserved and not overwritten by some subsequent syntax error. It was never useful that we kept inserting characters after the first error.
2024-09-11	the SciTECO parser is Unicode-based now (refs #5)	Robin Haberkorn	1	-6/+14
	The following rules apply: * All SciTECO macros __must__ be in valid UTF-8, regardless of the the register's configured encoding. This is checked against before execution, so we can use glib's non-validating UTF-8 API afterwards. * Things will inevitably get slower as we have to validate all macros first and convert to gunichar for each and every character passed into the parser. As an optimization, it may make sense to have our own inlineable version of g_utf8_get_char() (TODO). Also, Unicode glyphs in syntactically significant positions may be case-folded - just like ASCII chars were. This is is of course slower than case folding ASCII. The impact of this should be measured and perhaps we should restrict case folding to a-z via teco_ascii_toupper(). * The language itself does not use any non-ANSI characters, so you don't have to use UTF-8 characters. * Wherever the parser expects a single character, it will now accept an arbitrary Unicode/UTF-8 glyph as well. In other words, you can call macros like M§ instead of having to write M[§]. You can also get the codepoint of any Unicode character with ^^x. Pressing an Unicode character in the start state or in Ex and Fx will now give a sane error message. * When pressing a key which produces a multi-byte UTF-8 sequence, the character gets translated back and forth multiple times: 1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk). On Curses we could directly get a wide char using wget_wch(), but it's not currently used, so we don't depend on widechar curses. 2. Parsed into gunichar for passing into the edit command callbacks. This also validates the codepoint - everything later on can assume valid codepoints and valid UTF-8 strings. 3. Once the edit command handling decides to insert the key into the command line, it is serialized back into an UTF-8 string as the command line macro has to be in UTF-8 (like all other macros). 4. The parser reads back gunichars without validation for passing into the parser callbacks. * Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely inserted and displayed UTF-8 sequences, are now fixed.
2024-09-09	added raw ANSI mode to facilitate 8-bit clean editing (refs #5)	Robin Haberkorn	1	-1/+4
	* When enabled with bit 2 in the ED flags (0,4ED), all registers and buffers will get the raw ANSI encoding (as if 0EE had been called on them). You can still manually change the encoding, eg. by calling 65001EE afterwards. * Also the ANSI mode sets up character representations for all bytes >= 0x80. This is currently done only depending on the ED flag, not when setting 0EE. * Since setting 16,4ED for 8-bit clean editing in a macro can be tricky - the default unnamed buffer will still be at UTF-8 and at least a bunch of environment registers as well - we added the command line option `--8bit` (short `-8`) which configures the ED flags very early on. As another advantage you can mung the profile in 8-bit mode as well when using SciTECO as a sort of interactive hex editor. * Disable UTF-8 checks in 8-bit clean mode (sample.teco_ini).
2024-09-09	updated README and sciteco(7) with information about Unicode support (refs #5)	Robin Haberkorn	1	-8/+28

2024-09-09	the ^EUq string building escape now respects the encoding (can insert bytes ↵	Robin Haberkorn	1	-0/+6
	or codepoints) (refs #5) * This is trickier than it sounds because there isn't one single place to consult. It depends on the context. If the string argument relates to buffer contents - as in <I>, <S>, <FR> etc. - the buffer's encoding is consulted. If it goes into a register (EU), the register's encoding is consulted. Everything else (O, EN, EC, ES...) expects only Unicode codepoints. * This is communicated through a new field teco_machine_stringbuilding_t::codepage which must be set in the states' initial callback. * Seems overkill just for ^EUq, but it can be used for context-sensitive processing of all the other string building constructs as well. * ^V and ^W cannot be supported for Unicode characters for the time being without an Unicode-aware parser
2024-09-09	conditionals now check for Unicode codepoints (refs #5)	Robin Haberkorn	1	-6/+6
	* This will naturally work with both ASCII characters and various non-English scripts. * Unfortunately, it cannot work with the other non-ANSI single-byte codepages. * If we'd like to support scripts working with all sorts of codepoints, we'd have to introduce a new command for translating individual codepoints from the current codepage (as reported by EE) to Unicode.
2024-02-06	avoid Groff warnings due to `\` escapes	Robin Haberkorn	1	-1/+1
	* It's generally a bad idea to pass backslashes as a glyph in macro arguments, even as `\\` since this could easily be interpreted as an escape. * Instead we now always use `\[rs]`.
2024-01-20	removed nonsensical line from sciteco(7) man page	Robin Haberkorn	1	-1/+0
	* was introduced in e7867fb0
2023-06-19	the SciTECO data installation path is now configurable via --with-scitecodatadir	Robin Haberkorn	1	-1/+1
	* This is also the base of $SCITECOPATH. * Changing it is useful for packaging where it is not possible to factor out the common files between Curses and Gtk builds into a "sciteco-common" package. As an alternative, you can now create disjunct sciteco-curses and sciteco-gtk packages. * You will most likely want to use this for Gtk builds as in: --with-interface=gtk --program-prefix=g --with-scitecodatadir=/usr/local/share/gsciteco.
2023-04-05	Troff documents: fixed monospaced example blocks	Robin Haberkorn	1	-13/+13
	* .SCITECO_TT should be before .EX, so that the indent is already monospaced. .SCITECO_TT_END still needs to be before .EE however, so that the next non-monospaced line is not "typeset" with a monospaced indent. * naturally only affects the Gtk UI
2022-12-10	fixed pass-through loops: especially :> and :F<	Robin Haberkorn	1	-5/+5
	* fixes test cases like 3<%a:> * you can now use :F< in pass-through loops as well * F> outside of loops will now exit the current macro level. This is analogous to what TECO-11 did. In interactive mode, F> is currently also equivalent to $$ (terminates command line).
2021-05-30	THE GREAT CEEIFICATION EVENT	Robin Haberkorn	1	-6/+15
	This is a total conversion of SciTECO to plain C (GNU C11). The chance was taken to improve a lot of internal datastructures, fix fundamental bugs and lay the foundations of future features. The GTK user interface is now in an useable state! All changes have been squashed together. The language itself has almost not changed at all, except for: * Detection of string terminators (usually Escape) now takes the string building characters into account. A string is only terminated outside of string building characters. In other words, you can now for instance write I^EQ[Hello$world]$ This removes one of the last bits of shellisms which is out of place in SciTECO where no tokenization/lexing is performed. Consequently, the current termination character can also be escaped using ^Q/^R. This is used by auto completions to make sure that strings are inserted verbatim and without unwanted sideeffects. * All strings can now safely contain null-characters (see also: 8-bit cleanliness). The null-character itself (^@) is not (yet) a valid SciTECO command, though. An incomplete list of changes: * We got rid of the BSD headers for RB trees and lists/queues. The problem with them was that they used a form of metaprogramming only to gain a bit of type safety. It also resulted in less readble code. This was a C++ desease. The new code avoids metaprogramming only to gain type safety. The BSD tree.h has been replaced by rb3ptr by Jens Stimpfle (https://github.com/jstimpfle/rb3ptr). This implementation is also more memory efficient than BSD's. The BSD list.h and queue.h has been replaced with a custom src/list.h. * Fixed crashes, performance issues and compatibility issues with the Gtk 3 User Interface. It is now more or less ready for general use. The GDK lock is no longer used to avoid using deprecated functions. On the downside, the new implementation (driving the Gtk event loop stepwise) is even slower than the old one. A few glitches remain (see TODO), but it is hoped that they will be resolved by the Scintilla update which will be performed soon. * A lot of program units have been split up, so they are shorter and easier to maintain: core-commands.c, qreg-commands.c, goto-commands.c, file-utils.h. * Parser states are simply structs of callbacks now. They still use a kind of polymorphy using a preprocessor trick. TECO_DEFINE_STATE() takes an initializer list that will be merged with the default list of field initializers. To "subclass" states, you can simply define new macros that add initializers to existing macros. * Parsers no longer have a "transitions" table but the input_cb() may use switch-case statements. There are also teco_machine_main_transition_t now which can be used to implement simple transitions. Additionally, you can specify functions to execute during transitions. This largely avoids long switch-case-statements. * Parsers are embeddable/reusable now, at least in parse-only mode. This does not currently bring any advantages but may later be used to write a Scintilla lexer for TECO syntax highlighting. Once parsers are fully embeddable, it will also be possible to run TECO macros in a kind of coroutine which would allow them to process string arguments in real time. * undo.[ch] still uses metaprogramming extensively but via the C preprocessor of course. On the downside, most undo token generators must be initiated explicitly (theoretically we could have used embedded functions / trampolines to instantiate automatically but this has turned out to be dangereous). There is a TECO_DEFINE_UNDO_CALL() to generate closures for arbitrary functions now (ie. to call an arbitrary function at undo-time). This simplified a lot of code and is much shorter than manually pushing undo tokens in many cases. * Instead of the ridiculous C++ Curiously Recurring Template Pattern to achieve static polymorphy for user interface implementations, we now simply declare all functions to implement in interface.h and link in the implementations. This is possible since we no longer hace to define interface subclasses (all state is static variables in the interface's .c files). Headers are now significantly shorter than in C++ since we can often hide more of our "class" implementations. * Memory counting is based on dlmalloc for most platforms now. Unfortunately, there is no malloc implementation that provides an efficient constant-time memory counter that is guaranteed to decrease when freeing memory. But since we use a defined malloc implementation now, malloc_usable_size() can be used safely for tracking memory use. malloc() replacement is very tricky on Windows, so we use a poll thread on Windows. This can also be enabled on other supported platforms using --disable-malloc-replacement. All in all, I'm still not pleased with the state of memory limiting. It is a mess. * Error handling uses GError now. This has the advantage that the GError codes can be reused once we support error catching in the SciTECO language. * Added a few more test suite cases. * Haiku is no longer supported as builds are instable and I did not manage to debug them - quite possibly Haiku bugs were responsible. * Glib v2.44 or later are now required. The GTK UI requires Gtk+ v3.12 or later now. The GtkFlowBox fallback and sciteco-wrapper workaround are no longer required. * We now extensively use the GCC/Clang-specific g_auto feature (automatic deallocations when leaving the current code block). * Updated copyright to 2021. SciTECO has been in continuous development, even though there have been no commits since 2018. * Since these changes are so significant, the target release has been set to v2.0. It is planned that beginning with v3.0, the language will be kept stable.
2017-03-25	0,8ED: Automatic case-folding of commands	Robin Haberkorn	1	-0/+17
	* when enabled, it will automatically upper-case all one or two letter commands (which are case insensitive). * also affects the up-carret control commands, so they when inserted they look more like real control commands. * specifically does not affect case-insensitive Q-Register specifications * the result are command lines that are better readable and conform to the coding style used in SciTECO's standard library. This eases reusing command lines as well. * Consequently, string-building and pattern match characters should be case-folded as well, but they aren't currently since State::process_edit_cmd() does not have sufficient insight into the MicroStateMachines. Also, it could not be delegated to the MicroStateMachines. Perhaps they should be abandoned in favour of embeddedable regular state machines; or regular state machines with a stack of return states?
2017-02-26	more fixes for groff v1.19	Robin Haberkorn	1	-2/+2
	* fixes manpages, Groff warnings and building womanpages for older Groff versions. Groff v1.19 is in use eg. on FreeBSD 11. * tbl v1.19 has different column specifiers than on later versions. `X` cannot be used for expanded columns in these Groff versions.