This is a follow-up to #192 / #193.
The fix in #193 correctly handles CRLF after the closing boundary when it arrives in a single chunk. However, when \r and \n are split across separate TCP chunks (which happens in production with network proxies/load balancers), the warning is still emitted.
If you want, I can contribute to this library with creating PR.
Cause
In MultipartState.END, the current check requires both \r and \n to be in the same chunk:
https://github.com/Kludex/python-multipart/blob/master/python_multipart/multipart.py#L1415-L1423
elif state == MultipartState.END:
# Don't do anything if chunk ends with CRLF.
if c == CR and i + 1 < length and data[i + 1] == LF:
i += 2
continue
# Skip data after the last boundary.
self.logger.warning("Skipping data after last boundary")
i = length
break
When \r is the last byte of a chunk, i + 1 < length is False, so it falls through to the warning.
Who appends the CRLF
aiohttp's MultipartWriter.write() appends \r\n after the closing boundary:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/multipart.py (around line 1000)
if close_boundary:
await writer.write(b"--" + self._boundary + b"--\r\n")
This is valid per RFC 2046 Section 5.1.1:
close-delimiter := "--" boundary "--" transport-padding
[CRLF epilogue]
NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the
boundary value with the beginning of each candidate line. An exact
match of the entire candidate line is not required; it is sufficient
that the boundary appear in its entirety following the CRLF.
...these areas are generally not used because of the lack of proper
typing of these parts and the lack of clear semantics for handling
these areas at gateways, particularly X.400 gateways.
Reproduction
import asyncio
import io
import logging
import sys
from aiohttp import FormData, MultipartWriter
from python_multipart.multipart import MultipartParser
# -- Build body using aiohttp's FormData --
class BytesWriter:
def __init__(self):
self.buffer = bytearray()
async def write(self, data: bytes) -> None:
self.buffer.extend(data)
async def build_aiohttp_body():
form = FormData()
form.add_field("file", io.BytesIO(b"hello"), filename="test.txt", content_type="text/plain")
mpwriter = form()
assert isinstance(mpwriter, MultipartWriter)
writer = BytesWriter()
await mpwriter.write(writer)
return mpwriter._boundary, bytes(writer.buffer)
boundary, body = asyncio.run(build_aiohttp_body())
# Confirm aiohttp appends \r\n after closing boundary
final_boundary = b"--" + boundary + b"--"
idx = body.rfind(final_boundary)
after = body[idx + len(final_boundary):]
print(f"Data after final boundary: {after!r}") # b'\r\n'
# -- Logging setup --
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.WARNING)
handler.setFormatter(logging.Formatter("%(name)s - %(levelname)s - %(message)s"))
logger = logging.getLogger("python_multipart.multipart")
logger.addHandler(handler)
logger.setLevel(logging.WARNING)
# Case 1: single chunk - no warning
print("\n=== Case 1: single chunk ===")
p1 = MultipartParser(boundary, {})
p1.write(body)
p1.finalize()
print("(no warning)")
# Case 2: CR/LF split across chunks (split at -1) - BUG
print("\n=== Case 2: CR/LF split at -1 ===")
p2 = MultipartParser(boundary, {})
p2.write(body[:-1]) # chunk ends with \r
p2.write(body[-1:]) # next chunk is just \n
p2.finalize()
print("(done)")
# Case 3: split at -2 (both \r\n in second chunk) - OK
print("\n=== Case 3: split at -2 ===")
p3 = MultipartParser(boundary, {})
p3.write(body[:-2])
p3.write(body[-2:]) # \r\n together
p3.finalize()
print("(done)")
Output:
Data after final boundary: b'\r\n'
=== Case 1: single chunk ===
(no warning)
=== Case 2: CR/LF split at -1 ===
python_multipart.multipart - WARNING - Skipping data after last boundary
python_multipart.multipart - WARNING - Skipping data after last boundary
(done)
=== Case 3: split at -2 ===
(done)
Only Case 2 triggers the warning — when \r is the last byte of a chunk and \n arrives in the next chunk.
Environment
- python-multipart 0.0.20 (also reproducible on 0.0.22)
This is a follow-up to #192 / #193.
The fix in #193 correctly handles CRLF after the closing boundary when it arrives in a single chunk. However, when
\rand\nare split across separate TCP chunks (which happens in production with network proxies/load balancers), the warning is still emitted.If you want, I can contribute to this library with creating PR.
Cause
In
MultipartState.END, the current check requires both\rand\nto be in the same chunk:https://github.com/Kludex/python-multipart/blob/master/python_multipart/multipart.py#L1415-L1423
When
\ris the last byte of a chunk,i + 1 < lengthisFalse, so it falls through to the warning.Who appends the CRLF
aiohttp's
MultipartWriter.write()appends\r\nafter the closing boundary:https://github.com/aio-libs/aiohttp/blob/master/aiohttp/multipart.py (around line 1000)
This is valid per RFC 2046 Section 5.1.1:
Reproduction
Output:
Only Case 2 triggers the warning — when
\ris the last byte of a chunk and\narrives in the next chunk.Environment