Skip to content

_log_response / _log_request: XML pretty-printing crashes on non-ASCII content #121

@TiagoMLucio

Description

@TiagoMLucio

Summary

lxml.etree.XMLSyntaxError is raised in httpops.py when logging HTTP responses (or requests) whose XML contains non-ASCII UTF-8 characters (e.g. accented letters in attribute values).

Root Cause

In _log_response (and _log_request), the code converts response.content (bytes) to a printable string via repr(), then feeds it back into ET.fromstring():

rawtext = repr(response.content)[2:-1]   # bytes → escaped str  ("héllo" → "h\\xc3\\xa9llo")
# ...substitutions for \r, \n, \t...
tree = ET.fromstring(rawtext.encode())    # str → bytes with literal backslashes → invalid XML

repr() turns non-ASCII bytes into Python escape sequences (\xc3\xa9). Re-encoding that string does not restore the original UTF-8 bytes — it produces bytes containing literal backslash-x sequences. lxml then fails because those sequences are not valid XML:

lxml.etree.XMLSyntaxError: error parsing attribute name, line 46, column 14

The same pattern exists in _log_request (repr(request.body)[1:-1]).

Affected Code

Suggested Fix

Parse XML from the original bytes (response.content / request.body) instead of the repr()-then-encode() round-trip. Wrap in try/except so malformed content doesn't crash logging:

# _log_response — before
tree = ET.fromstring(rawtext.encode())
ET.indent(tree, space="  ")
rawtext = ET.tostring(tree).decode()

# _log_response — after
try:
    tree = ET.fromstring(response.content)
    ET.indent(tree, space="  ")
    rawtext = ET.tostring(tree, encoding='unicode')
except Exception:
    pass  # keep rawtext as-is
# _log_request — before
tree = ET.fromstring(rawtext)
ET.indent(tree, space="       ")
rawtext = ET.tostring(tree)

# _log_request — after
try:
    body_bytes = request.body if isinstance(request.body, bytes) else request.body.encode('utf-8')
    tree = ET.fromstring(body_bytes)
    ET.indent(tree, space="       ")
    rawtext = ET.tostring(tree, encoding='unicode')
except Exception:
    pass  # keep rawtext as-is

Reproduction

Any OSLC/DNG response containing non-ASCII UTF-8 characters in XML attribute values (e.g. requirement titles with accented characters) will trigger the crash during TRACE-level logging.

Example

content = '<?xml version="1.0"?><root name="café"/>'.encode('utf-8')
rawtext = repr(content)[2:-1]
# rawtext = '<?xml version="1.0"?><root name="caf\\xc3\\xa9"/>'
ET.fromstring(rawtext.encode())  # XMLSyntaxError

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions