Conversation
There was a problem hiding this comment.
Pull request overview
Adds AGENTS.md guidance files intended to be injected into agentic tooling context to steer code, tests, docs, and examples toward existing BayBE conventions.
Changes:
- Introduce root-level AI-agent coding guide (
AGENTS.md) covering architecture, typing, imports, CI, and workflow. - Add test-suite conventions for pytest structure/fixtures/parametrization (
tests/AGENTS.md). - Add docs and examples conventions for Sphinx/MyST and runnable scripts (
docs/AGENTS.md,examples/AGENTS.md).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
AGENTS.md |
Project-wide agent guidance for BayBE coding patterns, tooling, and PR workflow |
tests/AGENTS.md |
Conventions for writing/organizing pytest tests and fixtures |
docs/AGENTS.md |
Conventions for Sphinx/MyST docs authoring and syntax |
examples/AGENTS.md |
Conventions for executable examples and CI smoke-test behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
I would add this rather in contributing.md instead of a tests specific agents.md, though I don't think that's a major issue.
I also usually add these test principles to get better results (they assume you have pre-commit hooks to run the tests, which you should):
Testing Conventions
Summary:
- Test through public interfaces: call functions, assert on return values and file system side effects
- Mock only external boundaries: the [INSERT BOUNDARY POINT HERE] is the boundary — provide a
[SPECIFIC MOCKER HERE, WITH DEFINED NARROW SCOPE]that emits [PROJECT SPECIFIC DETAILS] - One test, one behavior: don't combine assertions about different concerns
- Tests as specs: names like
test_qc_fail_triggers_retry,test_missing_smiles_fails_validation
Run tests: python -m pytest backend/tests/
Pre-commit hook runs tests automatically before each commit. On a fresh clone, activate it once:
git config core.hooksPath .githooksThere was a problem hiding this comment.
total test suite runs 20-30 minutes so we cant run them in a high frequent manner such as pre-commit, would probably rather instruct to run tests relevant to the respective feature
There was a problem hiding this comment.
otherwise all looks good to me. I would say there may be some issues with lack of compliance for the anti-patterns as some of them are a bit underspecified, eg: No conftest pollution — prefer local fixtures. -> "pollution" would go better with examples of before and after for example
There was a problem hiding this comment.
ah @Scienfitz good point on too many tests. In that case I have CI run the full suite, and locally I have a subset using tags to only run those ones on pre-commit hook. This usually also prompts the agent naturally to run any task specific tests as well, as a reminder.
You can also just add a hook that only injects a warning/reminder about running tests into the chat context.
There was a problem hiding this comment.
seems like you have a no fallback rule.
I have run into issues with this before, here's my section about it:
## Zero Fallback Principle
Execution MUST abort immediately on any missing dependency, malformed data field, absent required column/key, unexpected enum value, or structural schema mismatch. Do **not** continue in a degraded or "best effort" mode. No silent defaults. No guessing. Expensive downstream computation must be prevented when prerequisites are not perfectly satisfied.
## Validation Philosophy
- **Per-Template Strict Validation**: Each template defines exact allowed factor choices; models cannot select factors outside their template.
- **Validation at Inference Time**: PromptSwarm validates outputs using the universal `ConditionRecommendationModel` with the template's `_allowed_choices` passed as validation context.
- **Zero Fallback**: Any validation failure TERMINATES that template's inference (circuit breaker pattern in promptswarm).
- **No Partial Results**: Invalid outputs are not written to the WFM database; no partial pipelines or continued processing.
Then I also have a hook that autodetects when Claude is trying to add a fallback and stops him:
https://github.com/merckgroup/condition_rec_benchmarking/blob/main/.claude/hooks/check-fallback.sh
There was a problem hiding this comment.
I've extracted some generalizable principles out of this and added to the file under the respective sections (590b521)
well discuss whether we can do something with the actual hook as well, in any case thanks for your input 🙌
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
To comply with agent instruction.
|
Hey, nice work on the recent updates -- the fail-fast language in sections 2/8/16 reads well, and the admonition rewrite is much clearer. I've been reviewing this alongside how I structure similar files in my repos and wanted to share a few thoughts. Happy to help implement any of these if useful, but also easy for you to pick up directly since they're mostly structural. CLAUDE.md + symlinks: Have you considered making the canonical file To keep other tools working, symlink from the root: Single source of truth, three consumers. The subdirectory files ( Complementary content from
I'm not suggesting replacing your work -- the AGENTS.md has valuable depth (validation patterns, deprecation strategy, the naming table, fixture architecture) that the Anti-pattern specificity: As m-aebrer mentioned, rules like "prefer local fixtures" are easier for agents to comply with when there's a concrete example. Even a one-liner showing the preferred pattern would help. Let me know if any of this is useful -- happy to put together changes on a separate branch or contribute directly here, whichever you'd prefer. |
Scienfitz
left a comment
There was a problem hiding this comment.
@LeanAndMean many thanks for the input. I'm turning your coment into several threads because otherwise it will be nearly impossible to converse about thes eeveral sub-suggestions made therein, feel free to comment in the respective threads
There was a problem hiding this comment.
via Kevin: Rename to CLAUDE.md + symlink for other tools
Consider making CLAUDE.md the canonical file (auto-loaded by Claude Code, supports subdirectory scoping). Symlink AGENTS.md and .github/copilot-instructions.md to it so all three tools share a single source of truth.
There was a problem hiding this comment.
@LeanAndMean not sure if I got your suggestions because that was already fully the idea of the AGENTS.md files. Afaik tools like claude or opencode alreayd recognize AGENTS.md (including auto-context-ingestion and subdirectory scoping) and do not really need CLAUDE.md or any other more specifically named file.
There was a problem hiding this comment.
via Kevin: Add a commands/validation section
The current file covers conventions but doesn't tell the agent how to validate its work. Add a commands section covering install, test (pytest --fast, pytest -k "test_name"), lint, typecheck, and tox environments (including tox -e mypy-py310 and tox -p for parallel).
There was a problem hiding this comment.
I like the idea maybe we could put that in or expand CONTROBUTING and link it in AGENTS
There was a problem hiding this comment.
via Kevin: Add an architecture-to-filepath mapping
Map each core domain concept to its file path (e.g. Campaign → baybe/campaign.py, SearchSpace → baybe/searchspace/). Gives agents "what is this" and "where to find it" in one pass.
There was a problem hiding this comment.
via Kevin: Distill key design principles for agents
Highlight the 2–3 principles agents are most likely to violate: comp-rep boundary, lazy imports ("non-negotiable"), and the serialization pattern. Complements the fuller treatment already in the file.
There was a problem hiding this comment.
via Kevin: Reduce root file token footprint
The root file is injected into every conversation. Consider trimming it (~270 lines → ~161) or moving depth into scoped subdirectory files to save token budget.
There was a problem hiding this comment.
I've already reduces the token footprint severly, they are currently at:
| File | Lines | Words | Token estimate |
|---|---|---|---|
AGENTS.md (root) |
276 | 1,898 | ~2,500 |
docs/AGENTS.md |
37 | 229 | ~300 |
examples/AGENTS.md |
29 | 167 | ~230 |
tests/AGENTS.md |
72 | 384 | ~550 |
| Total | 414 | 2,678 | ~3,600 |
Isnt that negligible already?
There was a problem hiding this comment.
via Kevin and Drew: Add concrete examples to anti-pattern rules
Rules like "prefer local fixtures" are easier for agents to follow with a one-liner showing the preferred pattern. Even minimal examples help.
AGENTS.mdfiles contain content intended for agentic operators. They are recognized by most coding frameworks (most importantlyclaudeandopencode) and are injected into the context whenever an agent reads a folder where such a file is contained. They lead to more consistent code being generated and generally more in line with what has already been done without explicitly having to state this over and over again.The content here is meant as a start and not as complete. We can continue to add rules as we evolve.
The content has been produced in the following manner: