Skip to content

Add grapheme cluster support #13

@jmacdonald

Description

@jmacdonald

Scribe's coordinate system is designed as an abstraction over multi-byte "characters", such that a Range spanning one offset corresponds to a single on-screen character, even if it that character is represented by more than a single byte. Currently, that abstraction is naively centered around UTF-8 code points. However, a single on-screen character can be composed of multiple code points, and as a result, working with data that contains such characters breaks much of Scribe's data handling.

A UTF-8 grapheme cluster is what we should be using as the smallest atomic unit of text. The unicode-segmentation crate provides iterators that handle grapheme clusters, rather than code points; let's migrate to that so that the coordinate system supports the full UTF-8 character set.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions