-
Notifications
You must be signed in to change notification settings - Fork 860
Description
I would like to suggest a structural review of the later part of the “Details of the String Type” section on this page:
https://www.php.net/manual/en/language.types.string.php
My concern is not that the current content is technically wrong, but that this part of the page currently mixes several different layers of explanation, which makes it harder to follow.
In particular, these topics seem to be very close together at the moment:
- how string literals are encoded
- the practical expectation that modern PHP source files are usually UTF-8
- legacy or exceptional cases such as Zend Multibyte
- broader Unicode caveats that go beyond encoding, such as normalization or other assumptions programmers may make about text
Because of this, it is difficult to distinguish between:
- the core explanation of how PHP string literals are represented, and
- the broader point that correct Unicode handling requires more than choosing the right API.
For example, the paragraph about string literals seems to combine the basic “source file encoding” explanation with examples that are closer to Unicode normalization and general text-handling concerns.
I wonder whether this part of the page could be reorganized so that:
- the explanation of string literals stays focused on source-file encoding,
- modern UTF-8 usage is treated as the practical default case,
- legacy/special cases are more clearly separated,
- and broader Unicode caveats remain in the final general discussion.
I am not proposing a concrete replacement text here, but I think separating these layers more clearly would make the page easier to read, especially for readers approaching PHP strings from a modern UTF-8-oriented context.