-
Notifications
You must be signed in to change notification settings - Fork 860
Description
Problem
Page:
https://www.php.net/manual/en/language.types.string.php
Section: "Details of the String Type"
The current section classifies string functions into several categories, including byte-oriented, encoding-aware, locale-dependent, and UTF-8-assuming functions.
However, this mixes different concerns:
- how strings are interpreted (bytes, encodings, Unicode)
- how behavior may vary (locale)
This makes it harder to understand how to correctly handle UTF-8 strings.
In addition, the documentation does not clearly state that the mbstring extension supports UTF-8, and the distinction between mbstring and intl is unclear.
Proposal
Simplify the classification to focus on how strings are interpreted, and move the locale discussion into a separate paragraph.
For example:
String functions in PHP can be broadly categorized based on how they interpret string data:
- Byte-oriented functions operate on strings as raw sequences of bytes.
- Encoding-aware functions interpret strings according to a specified encoding, such as UTF-8. The mbstring extension provides such functions and supports UTF-8 and other multibyte encodings.
- Unicode-aware functions assume UTF-8 and provide higher-level operations. These are primarily provided by the intl extension.
Then describe locale separately:
Some operations may be locale-dependent.
Historically, certain functions relied on the system locale (setlocale), but this behavior is being reduced.
For locale-aware operations based on Unicode, the intl extension provides explicit support using ICU.
Reference: https://wiki.php.net/rfc/strtolower-ascii
Rationale
- separates string interpretation from locale concerns
- makes UTF-8 support in mbstring explicit
- clarifies the role of mbstring vs intl
- improves readability and reduces confusion for users working with UTF-8