Open
Conversation
Documents the breaking API changes to migrate gReLU model zoo from wandb to HuggingFace as the default backend. Key decisions: - New HuggingFace-native API with full repo IDs - Legacy wandb functions moved to grelu.resources.wandb - Lineage functions use HF model card metadata Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
11 tasks covering dependency updates, code refactoring, tests, and documentation updates for the model zoo migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TDD-style tests for the new HuggingFace API before implementation. Tests cover list_models, list_datasets, download_model, download_dataset, load_model, get_datasets_by_model, get_base_models, and verify utility functions continue to work. All tests use mocking since functions don't exist yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace wandb-based model zoo functions with HuggingFace Hub API. This provides a simpler, more standard interface for accessing gReLU models and datasets hosted on HuggingFace. New functions: - list_models(), list_datasets(): List repos in gReLU collection - download_model(), download_dataset(): Download files from HF - load_model(): Download and load a LightningModel - get_model_info(), get_dataset_info(): Get repository metadata - get_datasets_by_model(), get_base_models(): Parse model card links - get_models_by_dataset(): Find models using a dataset Legacy wandb functions remain available via grelu.resources.wandb. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update all tutorials to use the new HuggingFace-based resource functions: - 1_inference.ipynb: load_model with repo_id for borzoi-model - 2_finetune.ipynb: download_dataset for tutorial-2-data - 3_train.ipynb: download_dataset with filename for microglia-scatac data - 4_design.ipynb: load_model with repo_id for human-atac-catlas-model - 5_variant.ipynb: load_model and download_dataset for variant tutorial - 7_simulations.ipynb: load_model for catlas and enformer models Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Helpful error messages when users try old wandb-style API, guiding them to HuggingFace API or legacy wandb submodule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4 tasks: tests, stub functions, load_model detection, verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…age in tutorials - visualize.py: Fix strand mapping that returned NaN for non-string values. Now handles: "+"/"-" strings, 1/-1 integers, "."/"*" (unstranded) - tutorial 2: Use download_dataset path directly (returns file, not dir) - tutorial 5: Specify filename parameter for variants.txt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cess - Update DEFAULT_HF_COLLECTION with correct collection slug - Pass token=False to api.get_collection() to avoid 403 errors when user has cached credentials that don't have org access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now see all available files in a HuggingFace repo when calling these functions, making it easier to find the right filename parameter for load_model() or download_dataset(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These tests are already in test_resources.py where they belong. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace 36 mock-based tests with 12 simple integration tests - Tests call real HuggingFace API using dedicated test repos: - Genentech/test-model for model download/load tests - Genentech/test-data for dataset download tests - Genentech/human-atac-catlas-model for lineage tests - Consolidate 8 deprecation tests into 2 - Simpler, more maintainable test code (520 → 139 lines) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove DEFAULT_WANDB_HOST import and wandb login code that is no longer needed after HuggingFace migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update test_lightning.py to use tests/files/test.fa instead of downloading hg38 genome. This prevents network failures during test collection in GitHub Actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add test_genome.fa with chr10 and chr21 (2000bp each) for GC matching - Add genome_file property to CustomGenome class for compatibility - Update test_get_gc_matched_intervals to use bundled test genome - Avoids network dependency on hg38 download in CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default to False to avoid PyTorch 2.6+ UnpicklingError when loading checkpoints containing numpy arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #180
This PR migrates the gReLU model zoo from Weights & Biases to HuggingFace as the default backend. This is a breaking change for v1.1.0.
Key Changes
New HuggingFace API in
grelu.resources:list_models()/list_datasets()- browse the model zooload_model(repo_id, filename)- load models from HuggingFacedownload_model()/download_dataset()- download filesget_model_info()/get_dataset_info()- get metadata including file listsget_datasets_by_model()/get_models_by_dataset()/get_base_models()- lineage queriesLegacy wandb support preserved at
grelu.resources.wandbdeprecation errors guide users from old API to new:
Updated pretrained models (BorzoiPretrainedModel, EnformerPretrainedModel) to download from HuggingFace
Modified some tests to use a mock genome file instead of downloading hg38 (which takes a long time and requires a network).
Migration Guide
Test Plan
Links