feat: multi-language orchestration loop with per-language config discovery#1859
feat: multi-language orchestration loop with per-language config discovery#1859mashraf-222 wants to merge 41 commits intomainfrom
Conversation
|
Claude finished @mashraf-222's task in 6m 18s —— View job PR Review Summary
|
| File | Coverage |
|---|---|
codeflash/main.py |
70% |
codeflash/code_utils/config_parser.py |
45% |
codeflash/code_utils/git_utils.py |
64% |
codeflash/cli_cmds/cli.py |
20% |
cli.py at 20% is expected (interactive CLI). The new multi-language orchestration logic in main.py at 70% is reasonable. auto_configure_language (main.py:244) is covered; detect_project_for_language (main.py:202) lacks direct test coverage — consider adding at least a happy-path unit test.
Pushed fix: style: auto-fix ruff formatting in config_parser.py (commit 275527db)
|
8c9aab5 to
d9dfb0b
Compare
…iles Java projects no longer need a standalone config file. Codeflash reads config from pom.xml <properties> or gradle.properties, and auto-detects source/test roots from build tool conventions. Changes: - Add parse_java_project_config() to read codeflash.* properties from pom.xml and gradle.properties - Add multi-module Maven scanning: parses each module's pom.xml for <sourceDirectory> and <testSourceDirectory>, picks module with most Java files as source root, identifies test modules by name - Route Java projects through build-file detection in config_parser.py before falling back to pyproject.toml - Detect Java language from pom.xml/build.gradle presence (no config needed) - Fix project_root for multi-module projects (was resolving to sub-module) - Fix JFR parser / separators (JVM uses com/example, normalized to com.example) - Fix graceful timeout (SIGTERM before SIGKILL for JFR dump + shutdown hooks) - Remove isRecording() check from TracingTransformer (was preventing class instrumentation for classes loaded during serialization) - Delete all codeflash.toml files from fixtures and code_to_optimize - Add 33 config detection tests - Update docs for zero-config Java setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replay tests call helper.replay() via reflection, not the target function directly. The behavior instrumentation can't wrap indirect calls and produces malformed output (code emitted outside class body) for large replay test files. For replay tests, just rename the class without adding instrumentation — JUnit pass/fail results verify correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect test framework from project build config and generate replay tests with appropriate imports (org.junit.Test for JUnit 4, org.junit.jupiter.api.Test for JUnit 5). Fixes compilation failures on projects using JUnit 4 (like aerospike-client-java). Also passes test_framework through run_java_tracer to generate_replay_tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ay tests Use a global counter per method name across all descriptors to generate unique test method names. Previously, overloaded methods (same name, different descriptor) would generate duplicate replay_methodName_N methods, causing compilation errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on skip 10 new tests covering: - JUnit 5 replay test generation (imports, class visibility) - JUnit 4 replay test generation (imports, public methods, @afterclass) - Overloaded method handling (no duplicate test method names) - Instrumentation skip for replay tests (behavior + perf mode) - Regular tests still get instrumented normally Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…solution 13 new tests covering: - JFR class name normalization (/ to . conversion) - Package-based sample filtering - Addressable time calculation from JFR samples - Method ranking order and format - Graceful timeout (SIGTERM before SIGKILL) - Multi-module project root detection (Path not str) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The behavior instrumentation was producing malformed output for compact @test lines (annotation + method signature on same line, common in replay tests). The method signature collection loop would skip past the opening brace and consume subsequent methods' content. Fix: detect when the @test annotation line already contains { and treat it as both annotation and method signature, avoiding the separate signature search that was over-consuming lines. Reverted the instrumentation skip for replay tests — they now get properly instrumented for both behavior capture and performance timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReplayHelper now reads CODEFLASH_MODE env var and produces the same output as the existing test instrumentation: - Behavior mode: captures return value via Kryo serialization, writes to SQLite (test_results table) for correctness comparison, prints start/end timing markers - Performance mode: runs inner loop for JIT warmup, prints timing markers for each iteration matching the expected format - No mode: just invokes the method (trace-only or manual testing) This achieves feature parity with the existing test instrumentation for replay tests, which call functions via reflection and can't be wrapped by text-level instrumentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ay tests + speedups - Trigger on any codeflash/** or tests/** changes (not just java subset) - Validate replay test files are discovered per-function - Already validates: replay test generation, global discovery count, optimization success, and minimum speedup percentage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The refactored Java project_root handling moved args.tests_root resolution after the project_root_from_module_root call, which passed a string instead of a Path. Restore the original order: resolve tests_root to Path first, then set test_project_root, then override both for Java multi-module projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use Path comparisons instead of forward-slash substring matching - Avoid parse_args() in test (reads stdin on Windows) — use Namespace directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use print(flush=True) instead of logging.info for subprocess output so CI logs show progress in real-time instead of buffering until completion. Also set PYTHONUNBUFFERED=1 for the subprocess. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_write_gradle_properties Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>
…ions harder - Set jdk.ExecutionSample#period=1ms (default was 10ms) so JFR captures samples from shorter-running programs - Workload.main now runs 1000 rounds with larger inputs so JFR can capture method-level CPU samples (repeatString with O(n²) concat dominates ~75% of samples) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ove priority Replace xml.etree.ElementTree with text-based regex manipulation in _write_maven_properties() and _remove_java_build_config(). ElementTree destroys XML comments, mangles namespace declarations (ns0: prefixes), and reformats whitespace. The new approach reads/writes pom.xml as plain text, only touching codeflash.* property lines. Also extracts duplicated key_map to shared _MAVEN_KEY_MAP constant and aligns remove priority to check pom.xml first (matching write order). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…os (TODO-37) Java detection in parse_config_file() short-circuited before the existing depth-comparison logic, so a parent pom.xml would override a closer package.json or pyproject.toml. Now all config sources are detected first and the closest one to CWD wins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…TODO-34, TODO-38) TODO-34: TracingClassVisitor hardcoded line number to 0 because ASM's visitMethod() doesn't provide line info. Added a pre-scan pass in TracingTransformer.instrumentClass() that collects first line numbers via visitLineNumber() before the instrumentation pass. TODO-38: Serialization timeouts/failures silently dropped captures with no visibility. Added AtomicInteger droppedCaptures counter and included it in flush() metadata output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed detect_packages_from_source() from min(2, len) to min(3, len) so com.aerospike.client.util produces prefix com.aerospike.client instead of com.aerospike. This reduces instrumentation to the actual source package instead of the entire organization namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r all languages The language singleton was only set after function discovery, but get_git_diff() needs it during discovery to filter by file extension. Now set it in process_pyproject_config() based on the config file type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r all languages The language singleton was only set after function discovery, but get_git_diff() needs it during discovery to filter by file extension. - config_parser.py: set config["language"] based on config file type (codeflash.toml → java, pyproject.toml → python) so all project types return a language - cli.py: call set_current_language() in process_pyproject_config() using the config value, before the optimizer runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ensions - Replace current_language_support().file_extensions with get_supported_extensions() from registry - Update tests: remove singleton dependency, add unsupported extension filtering test - Mixed Python+Java diffs now return both file types regardless of singleton state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add LanguageConfig dataclass with config, config_path, language fields - Add find_all_config_files() that discovers all codeflash configs in project hierarchy - Supports pyproject.toml (Python), codeflash.toml (Java), package.json (JS/TS) - Skips configs without [tool.codeflash] section, closest config wins per language - Add 6 tests covering discovery, filtering, parent directory search, deduplication Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…DISC-04) - Add smoke test confirming get_language_support usage, not singleton - No code changes needed, function already uses per-file registry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add apply_language_config() to cli.py for multi-language mode config application - Import LanguageConfig and Language enum in cli.py - Create test_multi_language_orchestration.py with 9 tests covering: module_root/tests_root setting, path resolution, project_root, CLI override preservation, formatter_cmds, language singleton, Python/Java config handling, Java default tests_root Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add orchestration loop that iterates over all discovered LanguageConfigs - Deep-copy args per language pass to prevent mutation leakage - Run git/GitHub checks once before loop via handle_optimize_all_arg_parsing - Preserve fallback to single-config path when find_all_config_files returns empty - Add 4 orchestration tests: sequential passes, singleton per pass, fallback to single config, args isolation between passes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ig_files - Extract shared normalization logic (path resolution, defaults, key conversion) into normalize_toml_config() - Use it in both find_all_config_files and parse_config_file to eliminate duplication - Add 6 tests verifying normalization behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap each language pass in try/except so one failure doesn't block others - Track per-language status (success/failed/skipped) in results dict - Add 3 tests verifying error isolation and failure tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test summary format with all success, mixed statuses, and empty results - Test skipped status when formatter check fails - 4 new tests covering _log_orchestration_summary behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation - Detect file language via get_language_support(Path(args.file)) - Filter language_configs to only the matching language before loop - Gracefully handle unsupported extensions and missing configs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_file_flag_filters_to_matching_language: Java file runs only Java pass - test_file_flag_python_file_filters_to_python: Python file runs only Python pass - test_file_flag_unknown_extension_runs_all: .rs file runs all language passes - test_file_flag_no_matching_config_runs_all: Java file with only Python config runs all - test_all_flag_sets_module_root_per_language: --all sets pass_args.all per language - test_no_flags_runs_all_language_passes: no flags runs all language passes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Tests for detect_unconfigured_languages() function - Tests for auto_configure_language() success and failure paths - Test for per-language logging output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion
- Add detect_unconfigured_languages() to identify languages in changed files lacking configs
- Add detect_project_for_language() using per-language detection helpers (avoids wrong-language pitfall)
- Add auto_configure_language() that writes config and re-discovers it in one step
- Add get_changed_file_paths() helper using git diff
- Wire auto-config into orchestration loop (only for subagent/no-flags path)
- Failed auto-config logs warning with manual setup instructions, continues gracefully
- Per-language "Processing {lang} (config: {path})" logging confirmed working
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add JS/TS config discovery tests (package.json, all three config types) - Add malformed TOML and missing codeflash section tests - Add JS/TS extension git diff tests (.js, .ts, .jsx, .tsx) - Add mixed three-language git diff test - Add TypeScript/JSX file flag routing tests - Add direct function coverage for get_changed_file_paths, detect_project_for_language - Add empty config normalize test - 13 new tests across 3 files (60 -> 73 total) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mock posthog and sentry initialization in all tests calling main() to prevent SystemExit when prior tests overwrite CODEFLASH_API_KEY - Re-register JavaSupport in clear_registry test to prevent Java language lookup failures in subsequent tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…orepo subdirectory scanning Adapt find_all_config_files() after rebasing on java-config-redesign (PR #1880): - Java detected via pom.xml/build.gradle instead of codeflash.toml - Add subdirectory scan for monorepo language subprojects (java/, js/ etc.) - Extract _check_dir_for_configs() to eliminate duplicated detection logic - Fix --all flag in multi-language mode (module_root wasn't available during resolution) - Add Java project_root directory override in apply_language_config() - Update all tests to use build-tool detection mocks and directory-based Java paths - Add 5 new monorepo discovery tests (subdir Java, subdir JS, all-three, skip-hidden, root-wins) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3a9f13f to
cdaa526
Compare

Summary
Adds multi-language orchestration to the Codeflash CLI. The optimizer now discovers all language configs in a project (Python, Java, JS/TS), and runs a full optimization pass for each language automatically.
This PR is part of a 3-repo change set:
codeflash --subagentcallThe CLI orchestration loop is the core — the cc-plugin delegates to it, and optimize-me validates it end-to-end.
What Changed
Config Discovery
find_all_config_files()walks CWD→root, discoverspyproject.toml(Python),codeflash.toml(Java),package.json(JS/TS). Closest config wins per language.LanguageConfigdataclass holds config dict, path, and language enum per discovered config.normalize_toml_config()shared helper for consistent config normalization.Orchestration Loop
main()iterates over discovered configs, deep-copies args per language, callsapply_language_config()thenoptimizer.run_with_args()for each.--fileflag filters to the matching language only.--alland no-flags run all discovered languages.Auto-Detection
detect_unconfigured_languages()compares configs vs git diff to find languages with changed files but no config.auto_configure_language()creates on-the-fly configs by detecting project roots (pom.xml, package.json).Language-Agnostic Git Diff
get_git_diff()now usesget_supported_extensions()from the registry instead of the singleton. The coarse filter lets through all supported files; per-file language detection happens downstream.Tests
test_multi_language_orchestration.py(orchestration loop, apply_language_config, summary logging, CLI path routing, unconfigured detection)test_multi_config_discovery.py(find_all_config_files, LanguageConfig)test_git_utils.py(language-agnostic diff filtering)test_registry.pyAll tests pass locally.
E2E Validation
Full E2E session with local backend services — Fibonacci (java-test-project), encodedLength (aerospike), isPalindrome + mergeSorted (optimize-me Java), retryWithBackoff (optimize-me JS) all passed.
Related PRs
codeflash-cc-plugin — Unified multi-language hook
The cc-plugin is the entry point that triggers Codeflash from Claude Code. Previously it had 3 separate per-language code paths that each invoked the CLI differently. With the orchestration loop introduced in this PR, the cc-plugin was simplified to a single
codeflash --subagentcall — it only needs to detect that any config exists and find the binary. All multi-language discovery and dispatch is now handled byfind_all_config_files()and the orchestration loop inmain.pyfrom this PR.optimize-me — Mixed-language test fixture
The optimize-me repo was extended with Java and JS/TS subprojects to serve as an E2E validation target for the orchestration loop. Each subproject has its own codeflash config file, so running Codeflash at the repo root exercises
find_all_config_files()discovering multiple configs and the loop running separate optimization passes per language. The E2E validation results above were produced by running this PR's code against the optimize-me fixture.Generated with Claude Code