Fix segmentation of long lexemes to avoid breaking before modifiers like accents

that must be drawn with their base letters. This is only a subset of implementing grapheme cluster boundaries but it improves behaviour with some Asian scripts like Thai and Javanese. Javanese is mostly written with (ASCII) Roman characters so issues will be rare but Thai uses Thai script. Also slightly improves placement of combining accents in European texts. https://github.com/notepad-plus-plus/notepad-plus-plus/issues/14822 https://github.com/notepad-plus-plus/notepad-plus-plus/issues/16115
author: Neil <nyamatongwe@gmail.com> 2025-02-04 11:47:48 +1100
committer: Neil <nyamatongwe@gmail.com> 2025-02-04 11:47:48 +1100
commit: 4c9ddc3121d0488914858ee511028520b96fd0e9 (patch)
tree: 03989eae1ce94f479749ef74e1e6c76c10f3e332 /src/UniConversion.h
parent: ef961772c3ced424f034c2055263d7231eccee01 (diff)
download: scintilla-mirror-4c9ddc3121d0488914858ee511028520b96fd0e9.tar.gz
1 files changed, 4 insertions, 0 deletions
diff --git a/src/UniConversion.h b/src/UniConversion.h
index 7a51b2d08..5990cca8c 100644
--- a/src/UniConversion.h
+++ b/src/UniConversion.h
@@ -49,6 +49,10 @@ constexpr bool UTF8IsTrailByte(unsigned char ch) noexcept {
 	return (ch >= 0x80) && (ch < 0xc0);
 }
 
+constexpr bool UTF8IsFirstByte(unsigned char ch) noexcept {
+	return (ch >= 0xc2) && (ch <= 0xf4);
+}
+
 constexpr bool UTF8IsAscii(unsigned char ch) noexcept {
 	return ch < 0x80;
 }
author	Neil <nyamatongwe@gmail.com>	2025-02-04 11:47:48 +1100
committer	Neil <nyamatongwe@gmail.com>	2025-02-04 11:47:48 +1100
commit	4c9ddc3121d0488914858ee511028520b96fd0e9 (patch)
tree	03989eae1ce94f479749ef74e1e6c76c10f3e332 /src/UniConversion.h
parent	ef961772c3ced424f034c2055263d7231eccee01 (diff)
download	scintilla-mirror-4c9ddc3121d0488914858ee511028520b96fd0e9.tar.gz