Skip to content

⚡️ Speed up function fix_json_escape_char by 343%#40

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-fix_json_escape_char-mgzelfq8
Open

⚡️ Speed up function fix_json_escape_char by 343%#40
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-fix_json_escape_char-mgzelfq8

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 343% (3.43x) speedup for fix_json_escape_char in pr_agent/algo/utils.py

⏱️ Runtime : 9.87 milliseconds 2.23 milliseconds (best of 30 runs)

📝 Explanation and details

Optimizations applied:

  • Convert to bytearray instead of list for string mutation:
    Using bytearray is much more efficient for single-character replacements and for joining back to a string, especially for typical ASCII JSON strings.
    The common pattern (list(str) with per-recursion ''.join) is replaced with in-place byte mutation followed by a single .decode('utf-8').

Behavior is preserved:

  • Exception extraction and character replacement logic are unchanged.
  • Recursion pattern is preserved.
  • No changes to comments except to clarify the bytearray usage.

If your JSON contains non-ASCII characters, you may need to adjust the encoding handling, but for most JSON logs/messages, this is significantly faster and avoids repeatedly copying large lists and strings.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 3 Passed
🌀 Generated Regression Tests 39 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
unittest/test_fix_json_escape_char.py::TestFixJsonEscapeChar.test_multiple_control_chars 17.1μs 17.0μs 0.601%✅
unittest/test_fix_json_escape_char.py::TestFixJsonEscapeChar.test_single_control_char 15.5μs 16.2μs -4.28%⚠️
unittest/test_fix_json_escape_char.py::TestFixJsonEscapeChar.test_valid_json 5.20μs 4.42μs 17.5%✅
🌀 Generated Regression Tests and Runtime
import json

# imports
import pytest  # used for our unit tests
from pr_agent.algo.utils import fix_json_escape_char

# unit tests

# ----------------------------- #
# Basic Test Cases
# ----------------------------- #

def test_valid_simple_json():
    # Test with a valid simple JSON string
    s = '{"a": 1, "b": 2}'
    codeflash_output = fix_json_escape_char(s) # 3.71μs -> 3.66μs (1.28% faster)

def test_valid_json_with_nested_dict():
    # Test with a nested JSON object
    s = '{"a": {"b": 2, "c": 3}, "d": 4}'
    codeflash_output = fix_json_escape_char(s) # 3.89μs -> 3.69μs (5.28% faster)

def test_valid_json_with_array():
    # Test with an array in JSON
    s = '{"arr": [1, 2, 3, 4]}'
    codeflash_output = fix_json_escape_char(s) # 3.78μs -> 3.61μs (4.60% faster)

def test_valid_json_with_string_values():
    # Test with string values and special characters
    s = '{"msg": "hello, world!", "val": "test123"}'
    codeflash_output = fix_json_escape_char(s) # 3.57μs -> 3.54μs (1.02% faster)

def test_valid_json_with_escaped_quotes():
    # Test with escaped quotes in string values
    s = '{"msg": "He said \\"hi\\""}'
    codeflash_output = fix_json_escape_char(s) # 3.64μs -> 3.75μs (2.88% slower)

# ----------------------------- #
# Edge Test Cases
# ----------------------------- #



def test_broken_json_unescaped_backslash():
    # Test with an unescaped backslash in a string
    s = '{"path": "C:\\Users\\John\\Documents\\file.txt"}'
    # This is actually valid JSON, so it should parse
    codeflash_output = fix_json_escape_char(s) # 25.4μs -> 25.2μs (0.938% faster)

def test_broken_json_invalid_escape():
    # Test with an invalid escape sequence
    s = '{"msg": "hello\\xworld"}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 11.2μs -> 11.2μs (0.394% faster)

def test_json_with_unicode_escape():
    # Test with unicode escape sequence
    s = '{"smile": "\\u263A"}'
    codeflash_output = fix_json_escape_char(s) # 3.55μs -> 3.52μs (0.995% faster)

def test_json_with_incomplete_unicode_escape():
    # Test with incomplete unicode escape
    s = '{"smile": "\\u263"}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 15.0μs -> 15.6μs (3.63% slower)

def test_json_with_control_character():
    # Test with a control character in a string (illegal in JSON)
    s = '{"msg": "hello\x01world"}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 10.4μs -> 10.6μs (2.31% slower)

def test_empty_json_object():
    # Test with an empty JSON object
    s = '{}'
    codeflash_output = fix_json_escape_char(s) # 2.83μs -> 2.54μs (11.4% faster)



def test_json_with_null_value():
    # Test with null value
    s = '{"a": null}'
    codeflash_output = fix_json_escape_char(s) # 5.33μs -> 5.24μs (1.64% faster)

def test_json_with_boolean_values():
    # Test with true/false
    s = '{"flag": true, "other": false}'
    codeflash_output = fix_json_escape_char(s) # 4.21μs -> 4.26μs (1.13% slower)

def test_json_with_number_edge_cases():
    # Test with edge-case numbers
    s = '{"big": 1e100, "small": -1e-100, "zero": 0}'
    codeflash_output = fix_json_escape_char(s) # 6.28μs -> 6.30μs (0.381% slower)

def test_json_with_array_of_objects():
    # Test with an array of objects
    s = '{"items": [{"a": 1}, {"b": 2}]}'
    codeflash_output = fix_json_escape_char(s) # 4.05μs -> 4.18μs (2.92% slower)

def test_json_with_multiline_string():
    # Test with a multiline string value
    s = '{"text": "line1\\nline2\\nline3"}'
    codeflash_output = fix_json_escape_char(s) # 3.83μs -> 3.75μs (2.02% faster)


def test_json_with_extra_open_brace():
    # Test with an extra open brace
    s = '{{"a": 1}}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 21.1μs -> 21.8μs (3.35% slower)

def test_json_with_extra_close_brace():
    # Test with an extra close brace
    s = '{"a": 1}}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 9.67μs -> 10.0μs (3.36% slower)

# ----------------------------- #
# Large Scale Test Cases
# ----------------------------- #

def test_large_json_object():
    # Test with a large JSON object (1000 keys)
    d = {f"key{i}": i for i in range(1000)}
    s = json.dumps(d)
    codeflash_output = fix_json_escape_char(s) # 117μs -> 116μs (1.00% faster)

def test_large_json_array():
    # Test with a large array in JSON (1000 elements)
    d = {"arr": list(range(1000))}
    s = json.dumps(d)
    codeflash_output = fix_json_escape_char(s) # 35.9μs -> 35.8μs (0.204% faster)

def test_large_json_with_broken_character():
    # Test with a large JSON and an intentional broken character
    d = {f"key{i}": i for i in range(1000)}
    s = json.dumps(d)
    # Introduce a broken quote in the middle
    broken_s = s[:500] + '"' + s[501:]
    codeflash_output = fix_json_escape_char(broken_s); result = codeflash_output # 675μs -> 172μs (290% faster)


def test_large_json_with_nested_structures():
    # Test with a large nested structure
    d = {"outer": [{"inner": [i for i in range(10)]} for _ in range(100)]}
    s = json.dumps(d)
    codeflash_output = fix_json_escape_char(s) # 47.1μs -> 46.5μs (1.34% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import json

# imports
import pytest  # used for our unit tests
from pr_agent.algo.utils import fix_json_escape_char

# unit tests

# ------------------ Basic Test Cases ------------------

def test_basic_valid_json():
    # Should parse a simple valid JSON string
    codeflash_output = fix_json_escape_char('{"a": 1, "b": 2}') # 3.79μs -> 3.78μs (0.370% faster)

def test_basic_valid_json_with_string_values():
    # Should parse JSON with string values
    codeflash_output = fix_json_escape_char('{"a": "hello", "b": "world"}') # 3.38μs -> 3.27μs (3.18% faster)

def test_basic_valid_json_with_nested_object():
    # Should parse JSON with nested objects
    codeflash_output = fix_json_escape_char('{"outer": {"inner": 42}}') # 3.65μs -> 3.67μs (0.735% slower)

def test_basic_valid_json_with_list():
    # Should parse JSON with a list
    codeflash_output = fix_json_escape_char('{"list": [1, 2, 3]}') # 3.67μs -> 3.51μs (4.73% faster)

def test_basic_valid_json_with_boolean_and_null():
    # Should parse JSON with boolean and null values
    codeflash_output = fix_json_escape_char('{"flag": true, "none": null}') # 3.48μs -> 3.42μs (1.82% faster)

# ------------------ Edge Test Cases ------------------



def test_json_with_broken_escape_sequence():
    # Should fix and parse JSON with a broken escape character
    # The string contains an invalid escape: \x
    codeflash_output = fix_json_escape_char('{"a": "hello\\xworld", "b": 2}') # 18.0μs -> 18.8μs (4.28% slower)

def test_json_with_unescaped_quote():
    # Should fix and parse JSON with an unescaped quote inside a string
    # This is not a valid JSON, but the function should remove the offending character
    # The result will lose the offending character
    codeflash_output = fix_json_escape_char('{"a": "hello"world", "b": 2}'); result = codeflash_output # 27.7μs -> 27.2μs (2.18% faster)



def test_json_with_unicode_escape_error():
    # Should fix and parse JSON with a broken unicode escape
    # '{"a": "\\u123", "b": 2}'  # incomplete unicode escape
    codeflash_output = fix_json_escape_char('{"a": "\\u123", "b": 2}'); result = codeflash_output # 21.9μs -> 22.3μs (1.80% slower)

def test_json_with_illegal_control_character():
    # Should fix and parse JSON with illegal control character
    # '\x01' is a control character
    s = '{"a": "hello\x01world", "b": 2}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 11.7μs -> 12.1μs (2.78% slower)

def test_json_with_extra_brace():
    # Should fix and parse JSON with an extra closing brace
    s = '{"a": 1, "b": 2}}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 10.2μs -> 10.4μs (2.06% slower)

def test_json_with_missing_colon():
    # Should fix and parse JSON with a missing colon
    s = '{"a" 1, "b": 2}'
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 22.6μs -> 22.0μs (2.49% faster)

# ------------------ Large Scale Test Cases ------------------

def test_large_json_object():
    # Should handle a large valid JSON object
    d = {f"key{i}": i for i in range(1000)}
    s = json.dumps(d)
    codeflash_output = fix_json_escape_char(s) # 117μs -> 115μs (1.28% faster)

def test_large_json_with_multiple_errors():
    # Should fix and parse a large JSON object with multiple errors
    # Insert some errors at random locations
    d = {f"key{i}": i for i in range(1000)}
    s = json.dumps(d)
    # Introduce errors: remove some colons and add extra commas
    s = s.replace(':', '', 3)  # remove first 3 colons
    s = s.replace(',', ',,', 3)  # double first 3 commas
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 2.52ms -> 239μs (952% faster)

def test_large_json_with_broken_escapes():
    # Should fix and parse large JSON with broken escape sequences
    d = {f"key{i}": f"value\\x{i}" for i in range(1000)}
    s = json.dumps(d)
    # Intentionally break some escape sequences
    s = s.replace('\\\\x', '\\x', 10)  # break first 10 escape sequences
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 1.62ms -> 190μs (748% faster)
    # The function will remove the offending characters, so the values will lose the broken escape
    expected = {f"key{i}": (f"value {i}" if i < 10 else f"value\\x{i}") for i in range(1000)}

def test_large_json_with_nested_structures():
    # Should handle large nested JSON structures
    d = {"outer": [{"inner": i} for i in range(1000)]}
    s = json.dumps(d)
    codeflash_output = fix_json_escape_char(s) # 111μs -> 109μs (2.48% faster)

def test_large_json_with_multiple_types_and_errors():
    # Should fix and parse a large, complex JSON with various types and errors
    d = {
        "ints": list(range(500)),
        "floats": [float(i) for i in range(500)],
        "bools": [i % 2 == 0 for i in range(500)],
        "strings": [f"str{i}\\x" for i in range(500)],
        "nested": [{"a": i, "b": [i, i+1]} for i in range(500)]
    }
    s = json.dumps(d)
    # Break some escape sequences and add some extra commas
    s = s.replace('\\\\x', '\\x', 10)
    s = s.replace(',', ',,', 10)
    codeflash_output = fix_json_escape_char(s); result = codeflash_output # 4.31ms -> 882μs (388% faster)
    # The broken escape sequences are fixed, so the first 10 string values will lose the broken escape
    expected = d.copy()
    expected["strings"] = [f"str{i} " if i < 10 else f"str{i}\\x" for i in range(500)]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-fix_json_escape_char-mgzelfq8 and push.

Codeflash

**Optimizations applied:**
- **Convert to `bytearray` instead of `list` for string mutation:**  
  Using `bytearray` is much more efficient for single-character replacements and for joining back to a string, especially for typical ASCII JSON strings.  
  The common pattern (`list(str)` with per-recursion `''.join`) is replaced with in-place byte mutation followed by a single `.decode('utf-8')`.

**Behavior is preserved:**
- Exception extraction and character replacement logic are unchanged.
- Recursion pattern is preserved.
- No changes to comments except to clarify the bytearray usage.

If your JSON contains *non-ASCII* characters, you may need to adjust the encoding handling, but for most JSON logs/messages, this is significantly faster and avoids repeatedly copying large lists and strings.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 20, 2025 17:22
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants