Conversation
There was a problem hiding this comment.
Pull request overview
Adds reusable utilities to support synthetic data analysis workflows (Malawi/Togo and future datasets) and improves observability in the CIDER preprocessing pipeline when filtering removes all rows.
Changes:
- Added
notebooks/analysis_helper.pywith helper functions for describing data, plotting, preprocessing/featurization, and simple modeling evaluation utilities. - Added an info log in
preprocess_datato surface when date filtering yields zero rows for a given schema.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/cider/featurizer/core.py |
Adds logging for empty post-filter datasets inside preprocess_data. |
notebooks/analysis_helper.py |
Introduces notebook-focused helper functions for analysis, plotting, preprocessing/featurization, and k-fold evaluation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Main reviews:
|
poornimaramesh
left a comment
There was a problem hiding this comment.
Looks good, thanks! We can return to this when we start up work again -- I think we should think about how to present / set this up so it can be reused with any arbitrary data.
I'd prefer to leave this as is for now.
Feel free to merge at wiil.
analysis_helper.pyfile consisting of functions to run analysis.featurizer.core.pysince multiple errors came from preprocessing that filtered all data, leaving empty processed data.