Skip to content

fix: cap wildcard import expansion to avoid token explosion#1951

Open
mashraf-222 wants to merge 1 commit intomainfrom
cf-1085-cap-wildcard-import-expansion
Open

fix: cap wildcard import expansion to avoid token explosion#1951
mashraf-222 wants to merge 1 commit intomainfrom
cf-1085-cap-wildcard-import-expansion

Conversation

@mashraf-222
Copy link
Copy Markdown
Contributor

Problem

Wildcard imports like import org.jooq.* expand to 870+ types, causing 5 minutes of disk I/O per function before discovering the 4000-token skeleton budget is exceeded. In jOOQ, 89% of functions (70/79) were skipped due to token overflow from wildcard imports.

The expand_wildcard_import() function globs all .java files in the package directory unconditionally, and the token budget check in get_java_imported_type_skeletons() only fires after reading each file and parsing its skeleton — by which point hundreds of files have already been read from disk.

Root Cause

context.py:933-940: Wildcard expansion happens without any count limit or early bailout.
import_resolver.py:223-252: expand_wildcard_import() returns all types unconditionally.

Fix

import_resolver.py

  • Added max_types parameter to expand_wildcard_import() for early termination
  • Added filter_names parameter to only include types matching a given set

context.py

  • Added MAX_WILDCARD_TYPES_UNFILTERED = 50 constant
  • When a wildcard expands to >50 types:
    • If target code references specific types → re-expand with filter_names=priority_types (only referenced types)
    • If no target types available → cap at 50 (first 50 found)
  • Small wildcards (<50) are expanded fully as before

This turns a 5-minute failure into <1 second resolution with only the relevant types included.

Test Coverage

New test test_large_wildcard_is_filtered_to_referenced_types:

  • Creates 70 types in a package (exceeds cap of 50)
  • Target code references only Type000 and Type001
  • Verifies only referenced types appear in result, not the full 70

All 4 existing edge case tests pass unchanged.

Closes CF-1085

🤖 Generated with Claude Code

…ute stalls

Wildcard imports like `import org.jooq.*` expand to 870+ types, causing
5 minutes of disk I/O per function before the token budget check kicks in.
89% of jOOQ functions were skipped due to this.

When a wildcard expands to >50 types, filter to only types referenced in
the target method's code. This turns a 5-minute failure into a <1 second
resolution with only the relevant types included.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 1, 2026

Claude finished @mashraf-222's task in 4m 36s —— View job


PR Review Summary

  • Triage PR size — LARGE (64 lines of production code changed)
  • Run lint and type checks
  • Resolve stale review threads — none
  • Review code changes
  • Duplicate detection
  • Merge optimization PRs

Prek Checks

uv run prek run --from-ref origin/mainall checks pass (ruff format, ruff check). No fixes needed.

Mypy reports 3 errors on lines 901, 936, 1104 of context.py, but all are pre-existing and not introduced by this PR (bare list annotations on the function signature + a pre-existing unreachable continue). Not flagging.


Code Review

The fix is correct and well-targeted. The probe-then-filter approach (call with max_types=LIMIT+1 to detect large packages, then re-call with filter_names=priority_types for the filtered set) is a clean pattern that avoids reading 870+ file entries when only a few are needed.

One design concern worth considering:

Double directory traversal (context.py:941,944): When a large wildcard package is detected, the code traverses the directory twice — once with max_types=51 (probe), and once unfiltered with filter_names=priority_types. The second traversal iterates over all 870 files in the directory (with a continue for non-matching names). This is much better than the original (which parsed all skeletons), since the directory listing is cheap. But if priority_types is non-empty and covers many names, it does full directory traversal. For the jOOQ case this is fine — the bottleneck was skeleton parsing, not globbing.

However, if priority_types types don't exist in the wildcard package at all (e.g. the wildcard is import com.other.* but referenced types are from a different package), the filtered call returns [] and that package contributes nothing to context. This is actually the correct behavior but worth noting.

No issues that would crash at runtime or break existing behavior. The new parameters to expand_wildcard_import both have sensible defaults (max_types=0 = no limit, filter_names=None = no filter), so existing callers are unaffected.


Duplicate Detection

No duplicates detected. expand_wildcard_import is Java-specific and exists only in languages/java/. No equivalent logic found in languages/python/ or languages/javascript/.


Test Coverage

New test test_large_wildcard_is_filtered_to_referenced_types is well-written:

  • Creates 70 synthetic types (exceeds the cap of 50)
  • Target code references only Type000 and Type001
  • Asserts those appear and Type050 does not
  • Imports MAX_WILDCARD_TYPES_UNFILTERED from the module (no magic number hardcoding — good)

One gap: No test for the "no priority types available → cap at 50" fallback path (else: expanded = expanded[:MAX_WILDCARD_TYPES_UNFILTERED]). Not blocking, but worth adding. Fix this →


Optimization PRs

PR #1943 (⚡️ fmt_delta +11%) — CI failures are pre-existing on its base branch cf-compare-copy-benchmarks, not caused by the optimization. Left a comment; leaving open.


Last updated: 2026-04-01T16:23 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant