Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
The following rules apply:
* All SciTECO macros __must__ be in valid UTF-8, regardless of the
the register's configured encoding.
This is checked against before execution, so we can use glib's non-validating
UTF-8 API afterwards.
* Things will inevitably get slower as we have to validate all macros first
and convert to gunichar for each and every character passed into the parser.
As an optimization, it may make sense to have our own inlineable version of
g_utf8_get_char() (TODO).
Also, Unicode glyphs in syntactically significant positions may be case-folded -
just like ASCII chars were. This is is of course slower than case folding
ASCII. The impact of this should be measured and perhaps we should restrict
case folding to a-z via teco_ascii_toupper().
* The language itself does not use any non-ANSI characters, so you don't have to
use UTF-8 characters.
* Wherever the parser expects a single character, it will now accept an arbitrary
Unicode/UTF-8 glyph as well.
In other words, you can call macros like M§ instead of having to write M[§].
You can also get the codepoint of any Unicode character with ^^x.
Pressing an Unicode character in the start state or in Ex and Fx will now
give a sane error message.
* When pressing a key which produces a multi-byte UTF-8 sequence, the character
gets translated back and forth multiple times:
1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
On Curses we could directly get a wide char using wget_wch(), but it's
not currently used, so we don't depend on widechar curses.
2. Parsed into gunichar for passing into the edit command callbacks.
This also validates the codepoint - everything later on can assume valid
codepoints and valid UTF-8 strings.
3. Once the edit command handling decides to insert the key into the command line,
it is serialized back into an UTF-8 string as the command line macro has
to be in UTF-8 (like all other macros).
4. The parser reads back gunichars without validation for passing into
the parser callbacks.
* Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
inserted and displayed UTF-8 sequences, are now fixed.
|
|
|
|
* This does not make sense for most SciTECO builds, but only when you
want to optimize for size as the lexers take up 50% of the compressed binary
size.
Without Lexilla, it should be possible get it compiled in about 500kb.
* It can be useful for instance when building for embedded distributions.
* When Lexilla is disabled, symbols-scilexer.c is also not generated
(we assume that the Lexilla sources are not available and it also doesn't serve any purpose).
* Consequently, most of the lexer configuration scripts are also not installed
under --without-lexilla.
|
|
|
|
registers
* An empty but valid teco_string_t can contain NULL pointers.
More precisely, a state's done_cb() can be invoked with such empty strings
in case of empty string arguments.
Also a registers get_string() can return the NULL pointer
for existing registers with uninitialized string parts.
* In all of these cases, the language should treat "uninitialized" strings
exactly like empty strings.
* Not doing so, resulted in a number of vulnerabilities.
* EN$$ crashed if "_" was uninitialized
* The ^E@q and ^ENq string building constructs would crash for existing but
uninitialized registers q.
* ?$ would crash
* ESSETILEXER$$ would crash
* This is now fixed.
Test cases have been added.
* I cannot guarantee that I have found all such cases.
Generally, it might be wise to change our definitions and make sure that
every teco_string_t must have an associated heap object to be valid.
All functions returning pointer+length pairs should consequently also never
return NULL pointers.
|
|
|
|
* Previous Scintilla version was 3.6.4 and Scinterm was 1.7 (with lots of custom patches).
All of the patches are now either irrelevant or have been merged upstream.
* Since Scintilla 5 requires C++17, this increases the minimum GCC version at least
to 5.0. We may actually require even newer versions.
* I could not upgrade the scintilla-mirror (which was imported from Mercurial),
so the old sciteco-dev branch was renamed to sciteco-dev-pre-v2.0.0,
master was deleted and I reimported the entire Scintilla repo using
git-remote-hg.
This means that scintilla-mirror now contains two entirely separate trees.
But it is still possible to clone old SciTECO repos.
* The strategy/workflow of maintaining hotfix branches on scintilla-mirror has been changed.
Instead of having one sciteco-dev branch that is rebased onto new Scintilla upstream
releases and tagging SciTECO releases in scintilla-mirror (to keep the commits referenced),
we now create a branch for every Scintilla version we are based on (eg. sciteco-rel-5-1-3).
This branch is never rebased or deleted. Therefore, we are guaranteed to be able to
clone arbitrary SciTECO repo commits - not only releases.
Releases no longer have to be tagged in scintilla-mirror.
On the downside, fixup commits may accumulate in these new branches.
They can only be squashed once a new branch for a new Scintilla release is created
(e.g. by cherry-picking followed by rebase).
* Scinterm does no longer have to reside in the Scintilla subdirectory,
so we added it as a regular submodule.
There are no more recursive submodules.
The Scinterm build system has not been improved at all, but we use
a trick based on VPATH to build Scinterm in scintilla/bin/.
* Scinterm is now in Git and we reference the upstream repo for the
time being.
We might mirror it and apply the same branching workflow as with Scintilla
if necessary.
The scinterm-mirror repository still exists but has not been touched.
We will also have to rewrite its master branch as it was a non-reproducible
Mercurial import.
* Scinterm now also comes with patches for Scintilla which we simply applied
on our sciteco-rel-5-1-3 branch.
* Scintilla 5 outsourced its lexers into the Lexilla project.
We added it as yet another submodule.
* All submodules have been moved into contrib/.
* The Scintilla API for setting lexers has consequently changed.
We now have to call SCI_SETILEXER(0, CreateLexer(name)).
As I did not want to introduce a separate command for setting lexers,
<ES> has been extended to allow setting lexers by name with the SCI_SETILEXER
message which effectively replaces SCI_SETLEXERLANGUAGE.
* The lexer macros (SCLEX_...) no longer serve any purpose - they weren't used
in the SciTECO standard library anyway - and have consequently been removed
from symbols-scilexer.c.
The style macros from SciLexer.h (SCE_...) are theoretically still useful - even
though they are not used by our current color schemes - and have therefore been
retained. They can be specified as wParam in <ES>.
* <ES> no longer allows symbolic constants for lParam.
This never made any sense since all supported symbols were always wParam.
* Scinterm supports new native cursor modes.
They are not used for the time being and the previous CARETSTYLE_BLOCK_AFTER
caret style is configured by default.
It makes no sense to enable native cursor modes now since the
command line should have a native cursor but is not yet a Scintilla view.
* The Scintilla upgrade performed much worse than before,
so some optimizations will be necessary.
|
|
file systems
* There is a "Scintilla.h" as well.
* should fix macOS and builds on native Windows hosts
* It wasn't practical to refer to the Scintilla includes using paths since
the Scintilla location is configurable (--with-scintilla).
So we'd have to write something like #include <include/Scintilla.h>.
For Scinterm we cannot avoid collisions neither as its path is also
configurable (--with-scinterm).
Effectively, we must prevent name clashes across SciTECO and all
of Scintilla and Scinterm.
|