Skip to content

⚡️ Speed up function get_external_base_class_inits by 344% in PR #1339 (coverage-no-files)#1352

Merged
KRRT7 merged 1 commit intocoverage-no-filesfrom
codeflash/optimize-pr1339-2026-02-04T01.03.46
Feb 4, 2026
Merged

⚡️ Speed up function get_external_base_class_inits by 344% in PR #1339 (coverage-no-files)#1352
KRRT7 merged 1 commit intocoverage-no-filesfrom
codeflash/optimize-pr1339-2026-02-04T01.03.46

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1339

If you approve this dependent PR, these changes will be merged into the original PR branch coverage-no-files.

This PR will be automatically closed if the original PR is merged.


📄 344% (3.44x) speedup for get_external_base_class_inits in codeflash/context/code_context_extractor.py

⏱️ Runtime : 88.7 milliseconds 20.0 milliseconds (best of 82 runs)

📝 Explanation and details

This optimization achieves a 343% speedup (88.7ms → 20.0ms) by eliminating redundant expensive operations through strategic caching and deduplication.

Key Optimizations

1. Deduplication of External Base Classes

  • Changed from list to set (external_bases_set) to automatically deduplicate base class entries
  • Prevents processing the same (base_name, module_name) pair multiple times
  • Removed the need for the extracted tracking set and subsequent membership checks

2. Module Project Check Caching

  • Added is_project_cache to memoize _is_project_module() results per module
  • This is critical because the profiler shows _is_project_module() consumed 79% of original runtime (265ms out of 336ms)
  • Each call involves expensive importlib.util.find_spec() and path_belongs_to_site_packages() operations
  • In the optimized version, this drops to just 16.7% (11.9ms) since most modules are checked only once

3. Module Import Caching

  • Added imported_module_cache to avoid repeated importlib.import_module() calls
  • When multiple classes inherit from the same base, the module is imported only once
  • Reduces import overhead from 4.84ms to 2.12ms in the line profiler

Performance Impact by Test Case

The optimization particularly excels when:

  • Multiple classes inherit from the same base: test_multiple_classes_same_base_extracted_once shows 565% speedup (19.0ms → 2.86ms)
  • Large codebases with many classes: test_large_single_code_string (500 classes) shows 1113% speedup (45.7ms → 3.77ms)
  • Many different external bases: test_many_classes_single_external_base (100 classes) shows 949% speedup (9.19ms → 875μs)

These improvements directly benefit production workloads since function_references shows this function is called from get_code_optimization_context, which is part of the code analysis pipeline. When analyzing projects with extensive class hierarchies that inherit from external libraries (like web frameworks, ORMs, or data processing libraries), the optimization prevents redundant module introspection and imports, making the code context extraction phase significantly faster.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 43 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 96.6%
🌀 Click to see Generated Regression Tests
import os
from pathlib import Path

# imports
import pytest  # used for our unit tests
# logger import used by the original implementation (kept for parity).
from codeflash.cli_cmds.console import logger
# The original helper used by the function (imported in original module).
from codeflash.code_utils.code_utils import path_belongs_to_site_packages
from codeflash.context.code_context_extractor import \
    get_external_base_class_inits
# Import real domain models used by the function under test.
from codeflash.models.models import CodeString, CodeStringsMarkdown

def test_basic_extracts_external_base_init_from_stdlib():
    # Basic scenario: a class inherits from a standard-library class implemented in Python
    # (http.server.BaseHTTPRequestHandler has a Python __init__), so the function should extract it.
    code = (
        "from http.server import BaseHTTPRequestHandler\n\n"
        "class MyHandler(BaseHTTPRequestHandler):\n"
        "    pass\n"
    )
    # Create a CodeString with valid Python language so CodeString validation passes
    cs = CodeString(code=code, file_path=Path("example.py"), language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 482μs -> 480μs (0.315% faster)

    extracted = result.code_strings[0]

def test_import_alias_does_not_resolve_to_original_name():
    # Edge case: using 'as' alias in import-from.
    # The implementation uses the alias name when looking up the attribute on the module,
    # which will not exist on the module object. This should therefore produce no extraction.
    code = (
        "from http.server import BaseHTTPRequestHandler as BHandler\n\n"
        "class AliasedHandler(BHandler):\n"
        "    pass\n"
    )
    cs = CodeString(code=code, file_path=Path("alias.py"), language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 152μs -> 151μs (0.631% faster)

def test_invalid_syntax_in_context_returns_empty():
    # Edge: Ensure that when combined source code is syntactically invalid, function returns empty context.
    # We must avoid CodeString's own syntax validation, so mark language as 'text' to bypass Python validation.
    cs = CodeString(code="def ) invalid_syntax", file_path=None, language="text")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 14.7μs -> 14.5μs (1.31% faster)

def test_import_statement_with_module_attribute_not_supported():
    # Edge case: 'import http.server' followed by inheritance using http.server.BaseHTTPRequestHandler
    # The function only records ImportFrom nodes, so it should not detect this base as external.
    code = (
        "import http.server\n\n"
        "class Handler(http.server.BaseHTTPRequestHandler):\n"
        "    pass\n"
    )
    cs = CodeString(code=code, file_path=Path("import_style.py"), language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 39.4μs -> 38.5μs (2.38% faster)

def test_c_implemented_builtin_init_is_skipped():
    # Edge: many builtin classes (e.g., datetime.datetime) have C-level implementations.
    # inspect.getsource will raise for such classes and the function should skip them.
    code = (
        "from datetime import datetime\n\n"
        "class MyDate(datetime):\n"
        "    pass\n"
    )
    cs = CodeString(code=code, file_path=Path("builtin.py"), language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 161μs -> 160μs (0.721% faster)

def test_multiple_classes_with_same_external_base_extracted_once():
    # Basic/Edge: When multiple classes in the context inherit the same external base,
    # the extraction for that base should occur only once (deduplication).
    code = (
        "from http.server import BaseHTTPRequestHandler\n\n"
        "class A(BaseHTTPRequestHandler):\n"
        "    pass\n\n"
        "class B(BaseHTTPRequestHandler):\n"
        "    pass\n"
    )
    cs = CodeString(code=code, file_path=Path("multi.py"), language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 567μs -> 477μs (18.8% faster)

def test_site_packages_file_path_shortening_when_available():
    # Conditional test: if 'requests' is available in the environment, ensure that when a class
    # is extracted from a class whose file path contains 'site-packages', the returned file_path
    # omits the leading path up to 'site-packages' (per implementation logic).
    requests = pytest.importorskip("requests")  # skip test if requests isn't installed
    code = (
        "from requests import Session\n\n"
        "class MySession(Session):\n"
        "    pass\n"
    )
    cs = CodeString(code=code, file_path=Path("requests_test.py"), language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 814μs -> 818μs (0.519% slower)

    # If extraction succeeded, the file_path should not contain the literal 'site-packages' segment.
    # If extraction failed (e.g., Session.__init__ cannot be sourced), the result will be empty and we assert that instead.
    if result.code_strings:
        fp = result.code_strings[0].file_path
    else:
        pass

def test_large_scale_many_code_blocks_are_handled_efficiently():
    # Large Scale: Build many code blocks (but keep under the stated limit of 1000).
    # Each block imports the same external base; deduplication means only one extraction should occur.
    blocks = []
    # Use a count that stresses parsing but remains reasonable for unit test time.
    count = 200
    single_snippet = "from http.server import BaseHTTPRequestHandler\n\nclass X{}(BaseHTTPRequestHandler):\n    pass\n"
    for i in range(count):
        code = single_snippet.format(i)
        blocks.append(CodeString(code=code, file_path=Path(f"large_{i}.py"), language="python"))

    context = CodeStringsMarkdown(code_strings=blocks)
    project_root = Path.cwd()

    codeflash_output = get_external_base_class_inits(context, project_root); result = codeflash_output # 19.0ms -> 2.86ms (565% faster)
    if result.code_strings:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import ast
import os
import sys
import tempfile
from pathlib import Path
from unittest.mock import MagicMock, patch

# imports
import pytest
# import the function and dependencies
from codeflash.context.code_context_extractor import (
    _is_project_module, get_external_base_class_inits)
from codeflash.models.models import CodeString, CodeStringsMarkdown

def test_empty_code_context():
    """Test with empty code context - should return empty CodeStringsMarkdown."""
    code_context = CodeStringsMarkdown(code_strings=[])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 21.2μs -> 19.6μs (8.31% faster)

def test_no_class_definitions():
    """Test code with no class definitions - should return empty result."""
    code_string = CodeString(code="x = 1\ny = 2", file_path=Path("test.py"))
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 32.6μs -> 31.5μs (3.57% faster)

def test_no_external_base_classes():
    """Test class that inherits from built-in classes - should return empty result."""
    code_string = CodeString(
        code="class MyClass(object):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 27.7μs -> 28.1μs (1.34% slower)

def test_simple_external_base_class():
    """Test a class inheriting from a simple external library base class."""
    # Using a well-known external library that's likely installed
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass MyIterable(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 176μs -> 169μs (3.99% faster)

def test_import_alias():
    """Test importing base class with alias."""
    code_string = CodeString(
        code="from collections.abc import Iterable as BaseIterable\n\nclass MyClass(BaseIterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 154μs -> 153μs (1.08% faster)

def test_multiple_bases_mixed():
    """Test class with multiple bases (some external, some local)."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass LocalBase:\n    pass\n\nclass MyClass(LocalBase, Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 167μs -> 165μs (0.830% faster)

def test_nonexistent_module_import():
    """Test importing from a module that doesn't exist."""
    code_string = CodeString(
        code="from nonexistent_module_xyz import SomeClass\n\nclass MyClass(SomeClass):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 228μs -> 225μs (1.30% faster)

def test_class_without_init():
    """Test external base class that doesn't define __init__."""
    # Using a class from standard library that might not have explicit __init__
    code_string = CodeString(
        code="from collections import Counter\n\nclass MyCounter(Counter):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 473μs -> 468μs (1.06% faster)

def test_attribute_base_class():
    """Test class with attribute-style base (e.g., module.ClassName)."""
    code_string = CodeString(
        code="import collections.abc\n\nclass MyClass(collections.abc.Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 38.4μs -> 37.9μs (1.55% faster)

def test_multiple_classes_same_base():
    """Test multiple classes inheriting from the same external base."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass Class1(Iterable):\n    pass\n\nclass Class2(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 260μs -> 173μs (50.1% faster)

def test_nested_class_definitions():
    """Test nested class definitions."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass Outer:\n    class Inner(Iterable):\n        pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 161μs -> 162μs (0.738% slower)

def test_conditional_imports():
    """Test code with conditional imports."""
    code_string = CodeString(
        code="if True:\n    from collections.abc import Iterable\n\nclass MyClass(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 40.3μs -> 39.8μs (1.46% faster)

def test_starred_imports():
    """Test code with starred imports (from module import *)."""
    code_string = CodeString(
        code="from collections.abc import *\n\nclass MyClass(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 33.4μs -> 33.0μs (1.26% faster)

def test_multiple_code_strings():
    """Test code context with multiple code strings."""
    code_string1 = CodeString(
        code="from collections.abc import Iterable",
        file_path=Path("import.py")
    )
    code_string2 = CodeString(
        code="class MyClass(Iterable):\n    pass",
        file_path=Path("class.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string1, code_string2])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 161μs -> 160μs (0.647% faster)

def test_import_from_package():
    """Test importing from a sub-package."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass MyClass(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 157μs -> 159μs (0.950% slower)

def test_class_with_no_bases():
    """Test standalone class with no base classes."""
    code_string = CodeString(
        code="class MyClass:\n    def __init__(self):\n        pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 34.5μs -> 35.0μs (1.50% slower)

def test_import_nonexistent_class_from_valid_module():
    """Test importing a class that doesn't exist from a valid module."""
    code_string = CodeString(
        code="from collections import NonexistentClass\n\nclass MyClass(NonexistentClass):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 152μs -> 150μs (1.60% faster)

def test_builtin_type_inheritance():
    """Test inheriting from built-in types."""
    code_string = CodeString(
        code="class MyList(list):\n    pass\n\nclass MyDict(dict):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 34.3μs -> 34.4μs (0.134% slower)

def test_class_with_complex_init_signature():
    """Test extracting __init__ from class with complex signature."""
    code_string = CodeString(
        code="from threading import Thread\n\nclass MyThread(Thread):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 1.22ms -> 1.21ms (0.908% faster)

def test_code_string_with_no_file_path():
    """Test code string that has no file_path."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass MyClass(Iterable):\n    pass"
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 165μs -> 166μs (0.600% slower)

def test_relative_project_path():
    """Test with relative project root path."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass MyClass(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("./project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 160μs -> 160μs (0.128% faster)

def test_deep_project_path():
    """Test with deeply nested project root path."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass MyClass(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/home/user/projects/myproject/src/package")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 158μs -> 157μs (0.697% faster)

def test_many_classes_single_external_base():
    """Test code with many classes inheriting from same external base."""
    # Create code with 100 classes inheriting from Iterable
    code_lines = ["from collections.abc import Iterable"]
    for i in range(100):
        code_lines.append(f"class Class{i}(Iterable):\n    pass")
    
    code_string = CodeString(
        code="\n".join(code_lines),
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 9.19ms -> 875μs (949% faster)

def test_many_different_external_bases():
    """Test code with many different external base classes."""
    code_lines = [
        "from collections.abc import Iterable, Iterator, Mapping, Sequence, Set",
        "from threading import Thread",
        "from io import IOBase"
    ]
    
    code_lines.append("class Class1(Iterable):\n    pass")
    code_lines.append("class Class2(Iterator):\n    pass")
    code_lines.append("class Class3(Mapping):\n    pass")
    code_lines.append("class Class4(Sequence):\n    pass")
    code_lines.append("class Class5(Set):\n    pass")
    code_lines.append("class Class6(Thread):\n    pass")
    code_lines.append("class Class7(IOBase):\n    pass")
    
    code_string = CodeString(
        code="\n".join(code_lines),
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 1.83ms -> 1.48ms (23.9% faster)

def test_large_code_context():
    """Test with large code context containing many code strings."""
    code_strings = []
    
    # Create 50 code strings with different imports
    for i in range(50):
        code = f"from collections import {'Counter' if i % 2 == 0 else 'deque'}\n"
        if i % 3 == 0:
            code += f"class TestClass{i}(Counter):\n    pass"
        code_strings.append(CodeString(code=code, file_path=Path(f"file{i}.py")))
    
    code_context = CodeStringsMarkdown(code_strings=code_strings)
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 2.12ms -> 788μs (169% faster)

def test_large_single_code_string():
    """Test with a single very large code string."""
    code_lines = ["from collections.abc import Iterable"]
    
    # Create 500 classes
    for i in range(500):
        code_lines.append(f"class Class{i}(Iterable):\n    pass")
    
    code_string = CodeString(
        code="\n".join(code_lines),
        file_path=Path("large_file.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 45.7ms -> 3.77ms (1113% faster)

def test_complex_inheritance_hierarchy():
    """Test code with complex multi-level inheritance."""
    code_lines = [
        "from collections.abc import Iterable, Iterator",
        "class Local1(Iterable):\n    pass",
        "class Local2(Local1, Iterator):\n    pass",
        "class Local3(Local2):\n    pass"
    ]
    
    code_string = CodeString(
        code="\n".join(code_lines),
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 290μs -> 189μs (53.2% faster)

def test_is_project_module_with_stdlib():
    """Test _is_project_module with stdlib module."""
    result = _is_project_module("os", Path("/tmp/project"))

def test_is_project_module_with_invalid_module():
    """Test _is_project_module with invalid module name."""
    result = _is_project_module("nonexistent_module_xyz_123", Path("/tmp/project"))

def test_is_project_module_with_external_library():
    """Test _is_project_module with external library."""
    result = _is_project_module("collections.abc", Path("/tmp/project"))

def test_result_has_correct_structure():
    """Test that result has correct structure with CodeString objects."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass MyClass(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 163μs -> 162μs (0.620% faster)
    for cs in result.code_strings:
        if cs.file_path:
            pass

def test_extracted_code_contains_class_definition():
    """Test that extracted code contains class definition line."""
    code_string = CodeString(
        code="from threading import Thread\n\nclass MyThread(Thread):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 1.25ms -> 1.21ms (3.63% faster)
    
    # If extraction succeeded, code should contain class definition
    if len(result.code_strings) > 0:
        for cs in result.code_strings:
            pass

def test_extracted_code_contains_init():
    """Test that extracted code contains __init__ method."""
    code_string = CodeString(
        code="from threading import Thread\n\nclass MyThread(Thread):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 1.21ms -> 1.19ms (1.73% faster)
    
    # If extraction succeeded, code should contain __init__
    if len(result.code_strings) > 0:
        for cs in result.code_strings:
            pass

def test_no_duplicate_extractions():
    """Test that same base class is not extracted multiple times."""
    code_string = CodeString(
        code="from collections.abc import Iterable\n\nclass Class1(Iterable):\n    pass\nclass Class2(Iterable):\n    pass\nclass Class3(Iterable):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 356μs -> 180μs (97.4% faster)
    
    # Count occurrences of 'class Iterable' in results
    iterable_count = sum(1 for cs in result.code_strings if "class Iterable" in cs.code)

def test_code_is_valid_python():
    """Test that extracted code is valid Python."""
    code_string = CodeString(
        code="from threading import Thread\n\nclass MyThread(Thread):\n    pass",
        file_path=Path("test.py")
    )
    code_context = CodeStringsMarkdown(code_strings=[code_string])
    project_root = Path("/tmp/project")
    
    codeflash_output = get_external_base_class_inits(code_context, project_root); result = codeflash_output # 1.19ms -> 1.19ms (0.017% faster)
    
    # All extracted code should be valid Python (parseable)
    for cs in result.code_strings:
        try:
            ast.parse(cs.code)
        except SyntaxError:
            pytest.fail(f"Extracted code is not valid Python: {cs.code}")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1339-2026-02-04T01.03.46 and push.

Codeflash Static Badge

This optimization achieves a **343% speedup** (88.7ms → 20.0ms) by eliminating redundant expensive operations through strategic caching and deduplication.

## Key Optimizations

**1. Deduplication of External Base Classes**
- Changed from list to set (`external_bases_set`) to automatically deduplicate base class entries
- Prevents processing the same (base_name, module_name) pair multiple times
- Removed the need for the `extracted` tracking set and subsequent membership checks

**2. Module Project Check Caching**
- Added `is_project_cache` to memoize `_is_project_module()` results per module
- This is critical because the profiler shows `_is_project_module()` consumed **79%** of original runtime (265ms out of 336ms)
- Each call involves expensive `importlib.util.find_spec()` and `path_belongs_to_site_packages()` operations
- In the optimized version, this drops to just **16.7%** (11.9ms) since most modules are checked only once

**3. Module Import Caching**
- Added `imported_module_cache` to avoid repeated `importlib.import_module()` calls
- When multiple classes inherit from the same base, the module is imported only once
- Reduces import overhead from 4.84ms to 2.12ms in the line profiler

## Performance Impact by Test Case

The optimization particularly excels when:
- **Multiple classes inherit from the same base**: `test_multiple_classes_same_base_extracted_once` shows 565% speedup (19.0ms → 2.86ms)
- **Large codebases with many classes**: `test_large_single_code_string` (500 classes) shows 1113% speedup (45.7ms → 3.77ms)
- **Many different external bases**: `test_many_classes_single_external_base` (100 classes) shows 949% speedup (9.19ms → 875μs)

These improvements directly benefit production workloads since `function_references` shows this function is called from `get_code_optimization_context`, which is part of the code analysis pipeline. When analyzing projects with extensive class hierarchies that inherit from external libraries (like web frameworks, ORMs, or data processing libraries), the optimization prevents redundant module introspection and imports, making the code context extraction phase significantly faster.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 4, 2026
@KRRT7 KRRT7 merged commit 5a70408 into coverage-no-files Feb 4, 2026
24 of 27 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1339-2026-02-04T01.03.46 branch February 4, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant