Skip to content

Bump chardet from 7.3.0 to 7.4.0.post1#2189

Merged
nikolas merged 1 commit intomasterfrom
dependabot/pip/chardet-7.4.0.post1
Mar 27, 2026
Merged

Bump chardet from 7.3.0 to 7.4.0.post1#2189
nikolas merged 1 commit intomasterfrom
dependabot/pip/chardet-7.4.0.post1

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Mar 27, 2026

Bumps chardet from 7.3.0 to 7.4.0.post1.

Release notes

Sourced from chardet's releases.

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

  • New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

  • Eliminated train/test data overlap via content fingerprinting
  • Added MADLAD-400 and Wikipedia as supplemental training sources
  • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF
  • Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)
  • Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6
Accuracy (2,517 files) 99.3% 88.2% 85.4%
Speed 551 files/s 12 files/s 376 files/s
Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

Changelog

Sourced from chardet's changelog.

Changelog

.. note::

Entries marked "via Claude" were developed with Claude Code <https://claude.ai/code>_. Dan directed the design, reviewed all output, and takes responsibility for the result. Unmarked entries by Dan were written without AI assistance.

7.4.0 (2026-03-26)

Performance:

  • Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

  • Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

    • Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)
    • Added encoding-aware substitution filtering: character substitutions during training now only apply for characters the target encoding cannot represent
    • Increased training samples from 15K to 25K per language/encoding pair (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude)

Bug Fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS: these superset encodings previously shared their base encoding's byte-range analyzer, missing extended ranges unique to each superset

... (truncated)

Commits
  • 15b2d8a fix: use separate=true on Windows only via pre-cibuildwheel patching
  • 26a1b25 fix: also drop orchestrator.py from mypyc (still over MSVC limit)
  • 26d425c fix: drop magic.py and ascii.py from mypyc to fix Windows C2026
  • 2e9542a fix: use multi_file=true for mypyc to fix Windows C2026
  • 0e33686 fix: use separate=true (not multi_file) for mypyc compilation
  • 22f29d2 fix: enable mypyc multi_file unconditionally to fix Windows C2026
  • 326adde fix: enable mypyc multi_file on Windows to avoid MSVC C2026
  • 582c664 docs: set 7.4.0 release date to 2026-03-26
  • a515745 docs: update version references from 7.3.1 to 7.4.0
  • 870988d docs: add 7.4.0 changelog and "via Claude" attribution markers
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 7.3.0 to 7.4.0.post1.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@7.3.0...7.4.0.post1)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.0.post1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Mar 27, 2026
@nikolas nikolas merged commit 441650e into master Mar 27, 2026
4 checks passed
@dependabot dependabot bot deleted the dependabot/pip/chardet-7.4.0.post1 branch March 27, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant