Skip to content

Skip preamble before first multipart boundary#262

Merged
Kludex merged 2 commits intomasterfrom
skip-leading-crlf-preamble
Apr 10, 2026
Merged

Skip preamble before first multipart boundary#262
Kludex merged 2 commits intomasterfrom
skip-leading-crlf-preamble

Conversation

@Kludex
Copy link
Copy Markdown
Owner

@Kludex Kludex commented Apr 10, 2026

Summary

  • skip CR/LF-prefixed preamble bytes by jumping to the next possible boundary start
  • keep the existing boundary validation once a candidate boundary is reached
  • avoid spending time byte-by-byte on large leading preambles before the first boundary

Parser behavior comparison

Parser Bytes before first boundary
Django Searches for the boundary and discards bytes before it as preamble. See BoundaryIter._find_boundary(): https://github.com/django/django/blob/main/django/http/multipartparser.py#L582-L675
Werkzeug Uses an explicit PREAMBLE state and regex-searches for the first boundary. See MultipartDecoder: https://github.com/pallets/werkzeug/blob/main/src/werkzeug/sansio/multipart.py#L98-L151
multipart The core parser rejects bytes before the first boundary unless the boundary is at offset 0 or preceded by CRLF, while the high-level non-strict helper can suppress parse errors. See PushMultipartParser: https://github.com/defnull/multipart/blob/master/multipart.py#L454-L487
python-multipart before Accepted leading CR/LF before the first boundary, but processed it byte-by-byte.
python-multipart after Skips ahead to the next possible boundary start, preserving boundary validation from that point.

Benchmark

Parser-only benchmark on macOS arm64, Python 3.14.3, comparing this PR against base commit 4addb60:

Size Before After
1 MB 0.087520s 0.000052s
5 MB 0.433339s 0.000109s
10 MB 0.861698s 0.000186s
25 MB 2.167683s 0.000501s
50 MB 4.356995s 0.000903s

Payload shape:

valid = (
    b"--boundary\r\n"
    b"Content-Disposition: form-data; name=\"f\"\r\n"
    b"\r\nval\r\n"
    b"--boundary--"
)
payload = b"\r\n" * (size // 2) + valid

Tests

  • uv run pytest tests/test_multipart.py -q -k "newlines_before_first_boundary or bad_start_boundary"
  • uv run pytest tests/test_multipart.py -q
  • uv run ruff format --check --diff python_multipart multipart tests
  • uv run ruff check .

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b379eb8f68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Kludex Kludex changed the title Skip leading CRLF before first boundary Skip preamble before first multipart boundary Apr 10, 2026
@Kludex Kludex merged commit 6a7b76d into master Apr 10, 2026
12 checks passed
@Kludex Kludex deleted the skip-leading-crlf-preamble branch April 10, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant