aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/UniConversion.h
AgeCommit message (Collapse)AuthorFilesLines
2025-03-08Define constants for UTF-8 and UTF-16 implementation for clarity.Neil1-1/+5
Add tests to check that inverted conversions yield the original value.
2025-02-04Fix segmentation of long lexemes to avoid breaking before modifiers like accentsNeil1-0/+4
that must be drawn with their base letters. This is only a subset of implementing grapheme cluster boundaries but it improves behaviour with some Asian scripts like Thai and Javanese. Javanese is mostly written with (ASCII) Roman characters so issues will be rare but Thai uses Thai script. Also slightly improves placement of combining accents in European texts. https://github.com/notepad-plus-plus/notepad-plus-plus/issues/14822 https://github.com/notepad-plus-plus/notepad-plus-plus/issues/16115
2025-02-04Add overload of UnicodeFromUTF8 that takes a string_view.Neil1-0/+1
2024-02-28Add variant of UTF8Classify that takes a char* so that client code does not haveNeil1-3/+4
to reinterpret_cast. Make functions in header constexpr. Prefer .data() to &[] since safer. Avoid else when not needed.
2024-02-14Remove redundant inline from constexpr functions.Neil1-5/+5
2022-08-13Reduce warnings with noexcept, fewer casts, and other minor changes.Neil1-1/+1
2021-08-24Remove line end white space.Neil1-1/+1
2021-07-13Restrict UTF8IsAscii to char and unsigned char to avoid failures when (signed)Neil1-1/+6
char passed.
2021-05-24Define C++ version of the Scintilla API in ScintillaTypes.h, ScintillaMessages.hNeil1-1/+1
and ScintillaStructures.h using scoped enumerations. Use these headers instead of Scintilla.h internally. External definitions go in the Scintilla namespace and internal definitio0ns in Scintilla::Internal.
2020-06-11Use noexcept where safe and maintainable.Neil1-1/+1
2020-05-19Encapsulate common check for PS, LS, and NEL as UTF8IsMultibyteLineEnd.Neil1-0/+7
Avoids construction of temporary array.
2020-03-13Using constexpr in UniConversion and CaseConvert.Neil1-4/+4
2019-03-20Use noexcept where sensible. Rename UTF8 string_view parameters for clarity.Neil1-6/+6
2019-03-20Implement WStringFromUTF8 to simplify code that creates wstring objects forNeil1-0/+4
regular expressions and calling the Win32 API.
2018-07-10Optional indexing of line starts in UTF-8 documents by UTF-32 code points andNeil1-0/+4
UTF-16 code units added.
2018-06-01Mark constant inline Unicode functions as constexpr.Neil1-4/+4
2018-06-01Add function to find a UTF-16 position in a UTF-8 string.Neil1-0/+1
2018-05-14Use string_view for UniConversion functions.Neil1-5/+5
2018-04-27Avoid reinterpret_cast. Use size_t argument to UTF8Classify to avoid casts.Neil1-1/+1
2018-04-21Tighten definition of regular expression iterators so they are noexcept andNeil1-10/+10
define all the standard member functions. This cascades to all methods called by the iterators, affecting Document, CellBuffer, Partitioning, SplitVector and UTF-8 and DBCS functions. Other trivial functions declared noexcept.
2018-03-24Feature [feature-requests:#1212]. Move Unicode conversions into UniConversion.Zufu Liu1-1/+15
Move Unicode conversion functions UnicodeFromUTF8 and UTF8FromUTF32Character into UniConversion.
2018-03-22Feature [feature-requests:#1211]. Use pre-computed table for UTF8BytesOfLead.Zufu Liu1-4/+6
Friendlier treatment of invalid UTF-8. Add tests for UniConversion handling invalid UTF-8. Simplify UTF8Classify tests.
2017-09-11The Scintilla namespace is always active for internal symbols and for the lexerNeil1-4/+0
interfaces ILexer4 and IDocument.
2017-05-02More consistent use of size_t when converting Unicode formats.Neil1-3/+3
2017-03-02Fix potential problems with IME on Cocoa when document contains invalid UTF-8.Neil1-0/+1
2015-11-20Bug [#1779]. Better Unicode input support on Windows systems.Sam Hocevar1-0/+4
- support surrogate pairs in WM_CHAR messages - support characters from supplementary planes in WM_UNICHAR messages - support WM_UNICHAR messages in non-Unicode mode - fix some code duplication Also, do not return FALSE upon receiving a WM_UNICHAR message with a UNICODE_NOCHAR parameter, since WM_UNICHAR can actually be handled just fine (at least with the exact same level of support as WM_CHAR).
2015-02-23Fix non-BMP character entry through the inline IME.Neil1-0/+6
2015-01-13Using size_t instead of unsigned int for conversions to UTF16 for 64-bitNeil1-2/+2
compatibility and to lessen the number of casts.
2014-10-02Allow using C++11 <regex> for searches as a provisional feature.Neil1-0/+4
2013-07-22Added the character representation feature.Neil1-0/+4
2013-07-21Standardising header guards and namespaces.Neil1-0/+13
2013-01-19Support the three Unicode line ends NEL, LS, and PS in CellBuffer, Document,nyamatongwe1-0/+13
Editor and the message interface. Will only be turned on for lexers that support Unicode line ends.
2012-05-26For case-insensitive UTF-8 searching, use UTF8Classify for finding validnyamatongwe1-0/+6
character width so compatible with other similar code. Optimize treatment of single byte ASCII characters and also optimize loop conditions. These mostly make up for the performance decrease from calling UTF8Classify. Add support definitions UTF8MaxBytes and UTF8IsAscii in UniConversion. Remove ExtractChar as no longer needed.
2012-05-26Optimize UTF-8 character length calculations by using an array.nyamatongwe1-0/+3
2012-05-26Move classification of UTF-8 byte sequences into UniConversion module.nyamatongwe1-0/+6
2010-05-02Bug #2995278 minor fixes to typos and types.nyamatongwe1-1/+1
2010-03-23Added function for finding how many bytes are in a UTF-8 character.nyamatongwe1-0/+1
2007-04-19All Unicode planes supported, not just the Basic Multilingual Plane.nyamatongwe1-3/+3
2001-02-24Updated documentation comments from Philippe.nyamatongwe1-1/+4
2001-01-28Updating copyright notices for 2001.nyamatongwe1-1/+1
2000-04-10Death of Accessor.nyamatongwe1-0/+9
Birth of UniConversion.