Age | Commit message (Collapse) | Author | Files | Lines |
|
lengths (refs #27)
* Allows storing pattern matches into Q-Registers (^YXq).
* You can also refer to subpatterns marked by ^E[...] by passing a number > 0.
This is equivalent to \0-9 references in many programming languages.
* It's especially useful for supporting TECO's equivalent of structural regular expressions.
This will be done with additional macros.
* You can also simply back up to the beginning of an insertion or search.
So I...$^SC leaves dot at the beginning of the insertion.
S...$^SC leaves dot before the found pattern.
This has been previously requested by users.
* Perhaps there should be ^Y string building characters as well to backreference
in search-replacement commands (TODO).
This means that the search commands would have to store the matched text itself
in teco_range_t structures since FR deletes the matched text before
processing the replacement string.
It could also be made into a FR/FS-specific construct,
so we don't fetch the substrings unnecessarily.
* This differs from DEC TECO in always returning the same range even after dot movements,
since we are storing start/end byte positions instead of only the length.
Also DEC TECO does not support fetching subpattern ranges.
|
|
5597bc72671d0128e6f0dba446c4dc8d47bf37d0)
* Using teco_expressions_eval() is wrong since it does not pay attention to precedences.
If you have multiple higher precedence operators in a row, as in 2+3*4*5,
the lower precedence operators would be resolved prematurely.
* Instead we now call teco_expressions_calc() repeatedly but only for lower precedence
operators on the stack top.
This makes sure that as much of the expression as possible is evaluated at any given moment.
|
|
* It was possible to provoke operator right-associativity when placing a high-precedence
operator between two low-precedence operators.
1-6*5-1 evaluated to -28 instead of the expected -30.
* The reason is that SciTECO relies on operators to be resolved from left-to-right as soon as possible.
The higher precedence operator prevents that and pushing the 2nd "-" only evaluated 6*5.
At the end 1-30-1 would be left on the stack.
teco_expressions_eval() however evaluates from right-to-left which is wrong in this case.
* Instead, we now do a full eval on every operator with a lower precedence, making sure that 1-30 is
evaluated first.
|
|
* We cannot call it "." since that introduces a local register
and we don't want to add an unnecessary syntactic exception.
* Allows the idiom [: ... ]: to temporarily move around.
Also, you can now write ^E\: without having to store dot in a register first.
* In the future we might add an ^E register as well for byte offsets.
However, there are much fewer useful applications.
* Of course, you can now also write nU: instead of nJ, Q: instead of "." and
n%: instead of "nC.". However it's all not really useful.
|
|
* This would actually causes crashes when trying to format numbers.
* The ^R local register has a custom set_integer() method now,
so that the check is performed also when using nU.^X.
|
|
^R now (refs #17)
* This way the search mode and radix are local to the current macro frame,
unless the macro was invoked with :Mq.
If colon-modified, you can reproduce the same effect by calling
[.^X 0^X ... ].^X
* The radix register is cached in the Q-Reg table as an optimization.
This could be done with the other "special" registers as well, but at the
cost of larger stack frames.
* In order to allow constructs like [.^X typed with upcarets,
the Q-Register specification syntax has been extended:
^c is the corresponding control code instead of the register "^".
|
|
* Usually you will only want -^X for enabling case sensitive searches
and 0^X for case-insensitive searches (which is also the default).
* An open question is what happens if the user sets -^X and then calls
a macro. The search mode flag should probably be stacked away along
with the search-string. This means we'd need a ^X special Q-Reg as well,
so you can write [^X[_ 0^X S...$ ]_]^X.
Alternatively, the search mode flag should be a property of the
macro frame, along with the radix.
|
|
* The program exit code will usually not signal failures since they are caught earlier.
* Therefore, we always have to capture and check stderr.
|
|
characters in Q-Register)
* It was initialized only once, so it could inherit the wrong local Q-Register table.
A test case has been added for this particular bug.
* Also, if starting from the profile (batch mode), the state machine could be initialized
without undo, which then later cause problems on rubout in interactive mode.
For instance, if S^EG[a] fails and you would repeatedly type `]`, the Q-Reg name could
grow indefinitely. There were probably other issues as well.
Even crashes should have been possible, although I couldn't reproduce them.
* Since the state machine is required only for the pattern to regexp translation
and is performed anew for every character in interactive mode,
we now create a fresh state machine for every call and don't attempt
any undo.
There might be more efficient ways, like reusing the string building's
Q-Reg parser state machine.
|
|
* A test case has been added, although it might have been accidental
that on caused crashes.
|
|
the pclose() return value
|
|
string cells
Found thanks to the "infinite monkey" test.
|
|
Supposing that any monkey hitting keys on a typewriter, serving as a hardcopy
SciTECO terminal, will sooner or later trigger bugs and crash the application,
the new monkey-test.apl script emulates such a monkey.
In fact it's a bit more elaborate as the generated macro follows the frequency
distribution extracted from the corpus of SciTECO macro files (via monkey-parse.apl).
This it is hoped, increases the chance to get into "interesting" parser states.
This also adds a new hidden --sandbox argument, but it works only on FreeBSD (via Capsicum)
so far. In sandbox mode, we cannot open any file or execute external commands.
It is made sure, that SciTECO cannot assert in sandbox mode for scripts that would
run without --sandbox, since assertions are the kind of things we would like to detect.
SciTECO must be sandboxed during "infinite monkey" tests, so it cannot accidentally
do any harm on the system running the tests.
All macros in sandbox mode must currently be passed via --eval.
Alternatively, we could add a test compilation unit and generate the test data
directly in memory via C code.
The new scripts are written in GNU APL 1.9 and will probably work only under FreeBSD.
These scripts are not meant to be run by everyone.
|
|
* Any memory error will let the test case fail with code 66.
* You can also call
make check TESTSUITEFLAGS="--valgrind"
* There is no program test for Valgrind in configure.ac for the time being.
`valgrind` must be in $PATH.
* All CI testsuite runs under Ubuntu are now with Valgrind.
|
|
* There was some boilerplate code missing in teco_state_search_all_initial(),
that is present in teco_state_search_initial().
* Perhaps there should be a common function to avoid redundancies?
* This will also fix the initialization of the string argument codepage for <N>.
|
|
* Supports all immediate editing commands.
Naturally it cannot emulate arbitrary key presses since there is no
canonic ASCII-encoding of function keys.
Key macros are not consequently also not testable.
The --fake-cmdline parameter is instead treated very similar to
a key macro expansion.
* Most importantly this allows adding test cases for rubout behavior
and bugs that are quite common.
* Added regression test cases for the last two rubout bugs.
* It's not easy to pass control codes in command line arguments in
a portable manner, so the test cases will often use { and }.
Control codes could be used e.g. by defining variables like
RUBOUT=`printf '\b'`
and referencing them with ${RUBOUT}.
|
|
* The old bug of saving gchar in gints, so teco_eol_reader_t::last_char could become negative.
* When converting from an UTF-8 text with CRLF linebreaks, we could have data loss and corruptions.
* On strings ending in UTF-8 characters, teco_eol_reader_t::offset would overflow, resulting
in invalid reads and potentially insertion of data garbage.
I observed this with G~ on Gtk.
* Test cased updated. Couldn't reproduce the bug with the test suite, though.
|
|
* The previous check could result in false positives if you are editing
a local Q-Register, that will be destroyed at the end of the current macro frame,
and call another non-colon modified macro.
* It must instead be invalid to keep the register edited only if it belongs to the
local Q-Registers that are about to be freed.
In other words, the table that the currently edited Q-Register belongs to, must be
the one we're about to destroy.
* This fixes the solarized.toggle (F5) macro when using the Solarized color scheme.
|
|
allow breaking from within braces
For instance, you can now write <23(1;)> without leaving anything on the stack.
|
|
* makes it possible, albeit cumbersome, to escape pattern match characters
* For instance, to search for ^Q, you now have to type
S^Q^Q^Q^Q$.
To search for ^E you have to type
S^Q^Q^Q^E$.
But the last character cannot be typed with carets currently (FIXME?).
For pattern-only characters, two ^Q should be sufficient as in
S^Q^Q^X$.
* Perhaps it would be more elegant to abolish the difference between string building
and pattern matching characters to avoid double quoting.
But then all string building constructs like ^EQq should operate at the pattern level
as well (ie. match the contents of register q verbatim instead of being interpreted as a pattern).
TECOC and TECO-64 don't do that either.
If we leave everything as it is, at least a new string building construct should be added for
auto-quoting patterns (analoguous to ^EN and ^E@).
|
|
* Previously you could open files of arbitrary size and the limit would be checked only afterwards.
* Many, but not all, cases should now be detected earlier.
Since Scintilla allocates lots of memory as part of rendering,
you can still run into memory limits even after successfully loading the file.
* Loading extremely large files can also be potentially slow.
Therefore, it is now possible to interrupt via CTRL+C.
Again, if the UI is blocking because of stuff done as part of rendering,
you still may not be able to interrupt the "blocking" operation.
|
|
|
|
* @EQ$/.../ sets the current directory from the contents of the given file.
@E%$/.../ stores the currend directory in the given file.
* @EQ*/.../ will fail, just like ^U*...$.
@E%*/.../ stores the current buffer's name in the given file.
* It's especially useful with the clipboard registers.
There could still be a minor bug in @E%~/.../ with regard to EOL normalization
as teco_view_save() will use the EOL style of the current document, which
may not be the style of the Q-Reg contents.
Conversions can generally be avoided for these particular commands.
But without teco_view_save() we'd have to care about save point creation.
|
|
* This was unsafe and could easily result in crashes, since teco_qreg_current
would afterwards point to an already freed Q-Register.
* Since automatically editing another register or buffer is not easy to do right,
we throw an error instead.
|
|
This was throwing glib assertions.
|
|
* It wasn't failing on FreeBSD because there are different default
stacksize limits.
We now set it to 8MB everywhere.
|
|
* This fixes F< to the beginning of the macro, which was broken in 73d574b71a10d4661ada20275cafde75aff6c1ba.
teco_machine_main_t::macro_pc actually has to be signed as it is sometimes set to -1.
|
|
* while code is guaranteed to be in valid UTF-8, this cannot be
said about the result of string building.
* The search pattern can end up with invalid Unicode bytes even when
searching on UTF-8 buffers, e.g. if ^EQq inserts garbage.
There are currently no checks.
* When searching on a raw buffer, it must be possible to
search for arbitrary bytes (^EUq).
Since teco_pattern2regexp() was always expecting clean UTF-8 input,
this would sometimes skip over too many bytes and could even crash.
* Instead, teco_pattern2regexp() now takes the <S> target codepage
into account.
|
|
The following rules apply:
* All SciTECO macros __must__ be in valid UTF-8, regardless of the
the register's configured encoding.
This is checked against before execution, so we can use glib's non-validating
UTF-8 API afterwards.
* Things will inevitably get slower as we have to validate all macros first
and convert to gunichar for each and every character passed into the parser.
As an optimization, it may make sense to have our own inlineable version of
g_utf8_get_char() (TODO).
Also, Unicode glyphs in syntactically significant positions may be case-folded -
just like ASCII chars were. This is is of course slower than case folding
ASCII. The impact of this should be measured and perhaps we should restrict
case folding to a-z via teco_ascii_toupper().
* The language itself does not use any non-ANSI characters, so you don't have to
use UTF-8 characters.
* Wherever the parser expects a single character, it will now accept an arbitrary
Unicode/UTF-8 glyph as well.
In other words, you can call macros like M§ instead of having to write M[§].
You can also get the codepoint of any Unicode character with ^^x.
Pressing an Unicode character in the start state or in Ex and Fx will now
give a sane error message.
* When pressing a key which produces a multi-byte UTF-8 sequence, the character
gets translated back and forth multiple times:
1. It's converted to an UTF-8 string, either buffered or by IME methods (Gtk).
On Curses we could directly get a wide char using wget_wch(), but it's
not currently used, so we don't depend on widechar curses.
2. Parsed into gunichar for passing into the edit command callbacks.
This also validates the codepoint - everything later on can assume valid
codepoints and valid UTF-8 strings.
3. Once the edit command handling decides to insert the key into the command line,
it is serialized back into an UTF-8 string as the command line macro has
to be in UTF-8 (like all other macros).
4. The parser reads back gunichars without validation for passing into
the parser callbacks.
* Flickering in the Curses UI and Pango warnings in Gtk, due to incompletely
inserted and displayed UTF-8 sequences, are now fixed.
|
|
* The libtool wrapper binaries do not pass down UTF-8 strings correctly,
so the Unicode tests failed under some circumstances.
* As we aren't actually linking against any locally-built shared libraries,
we are passing --disable-shared to libtool which inhibts wrapper generation
on win32 and fixes the test suite.
* Also use up to date autotools. This didn't fix anything, though.
* test suite: try writing an Unicode filename as well
* There have been problems doing that on Win32 where UTF-8 was not
correctly passed down from the command line and some Windows API
calls were only working with ANSI filenames etc.
|
|
(refs #5)
|
|
hopefully fixes the Unicode test cases on Mac OS
|
|
|
|
unsigned integer
* SCI_GETCHARAT is internally casted to `char`, which may be signed.
Characters > 127 therefore become negative and stay so when casted to sptr_t.
We therefore cast it back to guchar (unsigned char).
* The same is true whenever returning a string's character to SciTECO (teco_int_t)
as our string type is `gchar *`.
* <^^x> now also works for those characters.
Eventually, the parser will probably become UTF8-aware and this will
have to be done differently.
|
|
* You no longer have to copy contrib/scintilla, contrib/scinterm and contrib/lexilla
manually to the build directory.
* It turns out, that Scintilla/Lexilla was supporting this since 2016.
Scintilla allows pointing to a source directory (srdir) and Lexilla to a binary directory (DIR_O).
* For Scinterm I opened a pull request in order to add srcdir/basedir variables:
https://github.com/orbitalquark/scinterm/pull/21
* `make distcheck` is therefore now also fixed.
* The FreeBSD package is now allowed to build out of source.
I haven't tested it yet.
* See also https://github.com/ScintillaOrg/lexilla/issues/266
|
|
numbers now
* Instead of TECO_OP_NEW, there should perhaps simply be a flag of whether `,` was used.
|
|
* in fact, with a negative exponent the previous naive implementation would even hang indefinitely!
* Now uses the squaring algorithm.
This is slightly longer but significantly more efficient.
* added test cases
|
|
* passing an empty command string down to the shell would always do nothing,
so it doesn't make sense to support that.
* for the time being, we generate a proper error
* in the future, it might make sense to define some special behavior like repeating
the last command - but EC does not currently save the command line anywhere.
* The generated documentation is currently ugly (FIXME).
mandatory parameters are not properly detected by tedoc and we cannot keep apart
Q-Registers from mandatory parameters either.
Also, we should allow <param> markup in command summaries.
|
|
* This was setting only the teco_doc but wasn't calling the necessary
set_string() methods.
* The idiom [$ FG...$ ]$ to change the working directory temporarily
now works.
* Similarily you can now write [~ ^U~...$ ]~ to change the clipboard
temporarily.
* Added test suite cases. The clipboard is not tested since
it's not supported everywhere and would interfer with the host system.
* Resolved lots of redundancies in qreg.c.
The clipboard and workingdir Q-Regs have lots in common.
This is now abstracted in the "external" Q-Reg base "class"
(ie. via initializer TECO_INIT_QREG_EXTERNAL()).
It uses vtable calls which is slightly more inefficient than per
register implementations, but avoiding redundancies is probably more
important.
|
|
* it appears to behave similar to Mac OS with regard to recursions
|
|
* fixes test cases like 3<%a:>
* you can now use :F< in pass-through loops as well
* F> outside of loops will now exit the current macro level.
This is analogous to what TECO-11 did.
In interactive mode, F> is currently also equivalent to $$
(terminates command line).
|
|
* This is not easy to fix (show errors when encountering these constructs without
preceding if <"> statements) and would require complicating the parser only to
detect this.
On the other hand, keeping things as they are does not really harm anybody.
|
|
registers
* An empty but valid teco_string_t can contain NULL pointers.
More precisely, a state's done_cb() can be invoked with such empty strings
in case of empty string arguments.
Also a registers get_string() can return the NULL pointer
for existing registers with uninitialized string parts.
* In all of these cases, the language should treat "uninitialized" strings
exactly like empty strings.
* Not doing so, resulted in a number of vulnerabilities.
* EN$$ crashed if "_" was uninitialized
* The ^E@q and ^ENq string building constructs would crash for existing but
uninitialized registers q.
* ?$ would crash
* ESSETILEXER$$ would crash
* This is now fixed.
Test cases have been added.
* I cannot guarantee that I have found all such cases.
Generally, it might be wise to change our definitions and make sure that
every teco_string_t must have an associated heap object to be valid.
All functions returning pointer+length pairs should consequently also never
return NULL pointers.
|
|
* This test case no longer fails on MacOS and MinGW builds probably
because the settings of the underlying libpcre library changed.
* Since these settings are not predictable, cannot be queried and may even change on some
flavors of Linux, it has been completely disabled for the time being.
* Should fix CI and nightly builds on MacOS and Win32
|
|
* The C standard actually forbids this (undefined behaviour) even though
it seems intuitive that something like `memcpy(foo, NULL, 0)` does no harm.
* It turned out, there were actual real bugs related to this.
If memchr() was called with a variable that can be NULL,
the compiler could assume that the variable is actually always non-NULL
(since glibc declares memchr() with nonnull), consequently eliminating
checks for NULL afterwards.
The same could theoretically happen with memcpy().
This manifested itself in the empty search crashing when building with -O3.
Test case:
sciteco -e '@S//'
* Consequently, the nightly builds (at least for Ubuntu) also had this bug.
* In some cases, the passed in pointers are passed down from the caller but
should not be NULL, so I added runtime assertions to guard against it.
|
|
* Turns out we cannot assume that the test case never crashes on Mac OS,
so we instead now skip the entire test case on Mac OS.
It apparently crashes even on Mac OS when building with --enable-debug (-O0).
* Should fix Continous Integration for Mac OS.
|
|
* Test case: sciteco -e '[a'
[aEX$$ in interactive mode would also crash.
* No longer use a destructor - it was executed after the Q-Reg view was
destroyed.
* Instead, we now explicitly call teco_qreg_stack_clear() in main().
* Added a regression test case.
|
|
test escaping of braces in Q-Register specifications
|
|
$SHELL they execute the testsuite in
* instead, we now use `dd`.
|
|
* Turned out to be useful in debugging the "Memory limiting during spawning" test case
on Windows.
* Use UNIX shell emulation (0,128ED) in all test cases.
Should be necessary in order to run the testsuite on Windows, but
it is currently broken anyway.
* avoid <EG> when preprocessing files - use GNU Make's $(shell) instead
* Fixes builds on MinGW where there are still problems with <EC> and <EG>
at least in the virtual build environment.
* Results in a another automake warning about non-POSIX Make constructs.
This is not critical since we depend on GNU Make anyway.
|