Skip to content

Simplify string function classification and move locale discussion to a separate paragraph #5433

@masakielastic

Description

@masakielastic

Problem

Page:
https://www.php.net/manual/en/language.types.string.php

Section: "Details of the String Type"

The current section classifies string functions into several categories, including byte-oriented, encoding-aware, locale-dependent, and UTF-8-assuming functions.

However, this mixes different concerns:

  • how strings are interpreted (bytes, encodings, Unicode)
  • how behavior may vary (locale)

This makes it harder to understand how to correctly handle UTF-8 strings.

In addition, the documentation does not clearly state that the mbstring extension supports UTF-8, and the distinction between mbstring and intl is unclear.

Proposal

Simplify the classification to focus on how strings are interpreted, and move the locale discussion into a separate paragraph.

For example:

String functions in PHP can be broadly categorized based on how they interpret string data:

  1. Byte-oriented functions operate on strings as raw sequences of bytes.
  2. Encoding-aware functions interpret strings according to a specified encoding, such as UTF-8. The mbstring extension provides such functions and supports UTF-8 and other multibyte encodings.
  3. Unicode-aware functions assume UTF-8 and provide higher-level operations. These are primarily provided by the intl extension.

Then describe locale separately:

Some operations may be locale-dependent.
Historically, certain functions relied on the system locale (setlocale), but this behavior is being reduced.
For locale-aware operations based on Unicode, the intl extension provides explicit support using ICU.

Reference: https://wiki.php.net/rfc/strtolower-ascii

Rationale

  • separates string interpretation from locale concerns
  • makes UTF-8 support in mbstring explicit
  • clarifies the role of mbstring vs intl
  • improves readability and reduces confusion for users working with UTF-8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions