Skip to content

feat: add UBERON brain anatomy matching for all species#1825

Open
bendichter wants to merge 4 commits intoadd-brain-area-anatomyfrom
add-uberon-anatomy
Open

feat: add UBERON brain anatomy matching for all species#1825
bendichter wants to merge 4 commits intoadd-brain-area-anatomyfrom
add-uberon-anatomy

Conversation

@bendichter
Copy link
Copy Markdown
Member

Summary

  • Adds UBERON ontology-based brain region matching, extending anatomy extraction to all species (not just mouse)
  • For mice: tries Allen CCF first, falls back to UBERON per-token
  • For other species: tries UBERON directly
  • Synonym scope is a settable parameter (frozenset[str]), defaulting to EXACT only
  • Bundles ~2,400 nervous system descendant terms from UBERON as a compact JSON (~479KB)
  • Generation script parses the OBO file directly with no library dependencies

New files

  • dandi/data/generate_uberon_structures.py — downloads and parses uberon.obo, extracts nervous system descendants
  • dandi/data/uberon_brain_structures.json — bundled lookup data (2,408 structures, 9,717 synonyms)

Modified files

  • dandi/metadata/brain_areas.py — UBERON loading, lookup, and matching functions; locations_to_mouse_anatomy() with Allen→UBERON fallback
  • dandi/metadata/util.py_extract_brain_anatomy() now handles all species
  • dandi/tests/test_brain_areas.py — tests for UBERON matching, synonym scope control, and Allen/UBERON fallback
  • .pre-commit-config.yaml, pyproject.toml — codespell exclusions for new JSON

Test plan

  • All 56 brain area tests pass locally
  • All 179 metadata tests pass (including existing brain anatomy integration tests)
  • Pre-commit passes (black, isort, codespell, flake8)
  • CI passes on all platforms

🤖 Generated with Claude Code

For mice, location tokens are first matched against Allen CCF, then
fall back to UBERON.  For all other species, UBERON is tried directly.
Synonym scope (EXACT, RELATED, NARROW, BROAD) is a settable parameter,
defaulting to EXACT only.

- Add generate_uberon_structures.py to parse the UBERON OBO file and
  produce a bundled JSON of ~2,400 nervous-system descendants
- Add UBERON lookup/matching functions to brain_areas.py
- Update _extract_brain_anatomy in util.py to handle non-mouse species
- Add comprehensive tests for UBERON matching and Allen/UBERON fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bendichter bendichter added the minor Increment the minor version when merged label Mar 29, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bendichter bendichter added enhancement New feature or request minor Increment the minor version when merged and removed enhancement New feature or request minor Increment the minor version when merged labels Mar 29, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

❌ Patch coverage is 86.70886% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.50%. Comparing base (3c7ba2d) to head (2dc05ec).

Files with missing lines Patch % Lines
dandi/metadata/brain_areas.py 85.00% 12 Missing ⚠️
dandi/data/generate_uberon_structures.py 0.00% 7 Missing ⚠️
dandi/metadata/util.py 66.66% 2 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##           add-brain-area-anatomy    #1825      +/-   ##
==========================================================
+ Coverage                   75.34%   75.50%   +0.16%     
==========================================================
  Files                          87       88       +1     
  Lines                       12259    12413     +154     
==========================================================
+ Hits                         9237     9373     +136     
- Misses                       3022     3040      +18     
Flag Coverage Δ
unittests 75.50% <86.70%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bendichter and others added 2 commits March 29, 2026 12:44
Instead of passing a flat set of scopes, use a max_synonym_scope
parameter (default "EXACT").  Matching tries tiers in precision order:
EXACT > NARROW > BROAD > RELATED, up to the specified maximum.
Term names are always tried first before any synonym tier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread the scope parameter through so callers can control how
permissive UBERON synonym matching is.  Defaults to EXACT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant