Describe the bug
The GUIX multi-line text view widget has a bug in its word wrapping logic. The word boundary detection fails inconsistently, causing words to be split mid-character instead of at proper word boundaries (spaces, commas, semicolons).
I'm using GUIX 6.1.11
To Reproduce
- Create a multi-line text view widget
- Set text containing spaces, such as:
"Hola. Esta es una demostractión de inglés"
- Observe inconsistent word wrapping behavior
It appears that breaks work properly at punctuation like commas but not spaces? I think this is a UTF-8 issue
Inside the loop:
ch = string;
#ifdef GX_UTF8_SUPPORT
_gx_utility_utf8_string_character_get(&string, GX_NULL, &glyph_len);
current_index += glyph_len;
#else
string.gx_string_ptr++;
string.gx_string_length--;
#endif /* GX_UTF8_SUPPORT */
ch.gx_string_length = glyph_len;
So ch is meant to represent the current character (possibly multi-byte). But immediately afterwards, the code does single-byte checks like:
if (ch.gx_string_ptr[0] == GX_KEY_CARRIAGE_RETURN)
...
else if (ch.gx_string_ptr[0] == GX_KEY_LINE_FEED)
...
else if (((text_info -> gx_text_display_width + char_width) > available_width - 1) &&
(text_info -> gx_text_display_number > 0) &&
(ch.gx_string_ptr[0] != ' '))
...
if ((ch.gx_string_ptr[0] == ' ') || (ch.gx_string_ptr[0] == ',') || (ch.gx_string_ptr[0] == ';'))
For ASCII, ch.gx_string_ptr[0] works fine, since one character == one byte. But for UTF-8, ch.gx_string_length may be >1, but the code only checks the first byte of the UTF-8 sequence. Non-ASCII spaces (e.g. U+00A0 non-breaking space, U+3000 ideographic space) will never be recognized as valid breakpoints, because their first UTF-8 byte isn’t ' ' (0x20).
Also, this condition refuses to backtrack when the overflowing glyph is a space:
else if (((text_info->gx_text_display_width + char_width) > available_width - 1) &&
(text_info->gx_text_display_number > 0) &&
(ch.gx_string_ptr[0] != ' '))
{
if (display_number == 0) {
break;
}
text_info->gx_text_display_width = display_width;
text_info->gx_text_display_number = display_number;
break;
}
Expected behavior
- Words should break at natural boundaries (spaces, punctuation)
- Long words that exceed line width should break at word boundaries when possible
- Consistent behavior between ASCII and UTF-8 text
Impact
Annoyance, bad experience
Describe the bug
The GUIX multi-line text view widget has a bug in its word wrapping logic. The word boundary detection fails inconsistently, causing words to be split mid-character instead of at proper word boundaries (spaces, commas, semicolons).
I'm using GUIX 6.1.11
To Reproduce
"Hola. Esta es una demostractión de inglés"It appears that breaks work properly at punctuation like commas but not spaces? I think this is a UTF-8 issue
Inside the loop:
So ch is meant to represent the current character (possibly multi-byte). But immediately afterwards, the code does single-byte checks like:
For ASCII,
ch.gx_string_ptr[0]works fine, since one character == one byte. But for UTF-8,ch.gx_string_lengthmay be >1, but the code only checks the first byte of the UTF-8 sequence. Non-ASCII spaces (e.g. U+00A0 non-breaking space, U+3000 ideographic space) will never be recognized as valid breakpoints, because their first UTF-8 byte isn’t ' ' (0x20).Also, this condition refuses to backtrack when the overflowing glyph is a space:
Expected behavior
Impact
Annoyance, bad experience