Skip to content

feat: add command step, loop step, and swe-tdd workflow#537

Draft
dataforxyz wants to merge 1 commit intoendorhq:mainfrom
dataforxyz:feat/command-loop-swe-tdd
Draft

feat: add command step, loop step, and swe-tdd workflow#537
dataforxyz wants to merge 1 commit intoendorhq:mainfrom
dataforxyz:feat/command-loop-swe-tdd

Conversation

@dataforxyz
Copy link
Copy Markdown
Contributor

Summary

  • Adds command step type for running shell commands in workflows (with timeout, args, and output capture)
  • Adds loop step type for iterating sub-steps until a condition is met (with configurable max iterations)
  • Adds condition evaluation engine for steps.<id>.outputs.<name> == <value> expressions
  • Introduces the swe-tdd workflow: a test-driven development workflow that writes tests first, then implements and iterates until tests pass
  • Adds findStep() method to WorkflowManager for non-throwing recursive step lookup
  • Fixes schema validation to recursively collect step IDs inside loop steps

Key changes

  • packages/agent/src/lib/command-runner.ts - New command step executor that runs commands through sh -c for proper shell handling
  • packages/agent/src/lib/condition.ts - Condition evaluator for loop until expressions
  • packages/agent/src/lib/step-executor.ts - Step dispatcher that routes agent/command/loop steps to their executors
  • packages/cli/src/lib/workflows/swe-tdd.yml - Full TDD workflow with test writing, implementation, test-fix loop, and code review stages
  • packages/schemas/src/workflow/schema.ts - Schema additions for command and loop step types
  • packages/core/src/files/workflow.ts - findStep() for safe recursive lookups

Test plan

  • Unit tests for command runner (9 tests) - argument resolution, timeout, exit code capture, shell execution
  • Unit tests for condition evaluator (11 tests) - equality, inequality, missing steps, whitespace
  • Unit tests for step executor (5 tests) - dispatch, loop iteration, max iterations
  • All existing workflow tests pass (49 tests in core, 876 total in core package)
  • Manual testing of swe-tdd workflow end-to-end

🤖 Generated with Claude Code

@dataforxyz dataforxyz marked this pull request as draft February 24, 2026 01:44
@dataforxyz dataforxyz force-pushed the feat/command-loop-swe-tdd branch from 91f8d7b to d869d4a Compare February 24, 2026 02:05
Add command step type for running shell commands directly in workflows,
loop step type for iterating agent steps with conditions, and the
swe-tdd workflow that uses these to implement test-driven development.

Includes placeholder resolution, condition evaluation, step executor
refactoring, and comprehensive test coverage.
@dataforxyz dataforxyz force-pushed the feat/command-loop-swe-tdd branch from 052f102 to 7dc8c13 Compare February 26, 2026 04:24
@ereslibre ereslibre self-assigned this Feb 26, 2026
Copy link
Copy Markdown
Collaborator

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @dataforxyz!

Some comments regarding the change :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type: 'command' was already implemented in #526.

Could you update this PR to remove this part, or do you miss anything on the current implementation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow itself looks great, this would be a great workflow addition as a whole!

filename: changes.md

- id: test_fix_loop
type: loop
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this looks interesting, I think using it in practice could be a bit cumbersome.

E.g. can until refer only to 'lexical' previous steps? Or, can it access data from the last previous iteration?

It would be interesting to understand if we can get this same feature through prompting. Something along the lines of 'Fix the code so that all tests pass, or the failures are reduced. Ensure tests pass, retry this loop at most 5 times'. Would something like that work?

In that case we might not need to add a new loop building block.

- id: fix_code
type: agent
name: 'Fix Failing Tests'
if: steps.run_tests.outputs.exit_code != 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if building block looks great, would be a great addition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants