Skip to content

Vw/synthetic data analysis#72

Merged
vivwqy merged 11 commits intomainfrom
vw/synthetic-data-analysis
Mar 25, 2026
Merged

Vw/synthetic data analysis#72
vivwqy merged 11 commits intomainfrom
vw/synthetic-data-analysis

Conversation

@vivwqy
Copy link
Copy Markdown
Collaborator

@vivwqy vivwqy commented Mar 12, 2026

  • Cleaned notebook for synthetic data analysis - contains malawi and togo analysis results. Code in the notebook aims to easily replicate analysis for new datasets.
  • Added an analysis_helper.py file consisting of functions to run analysis.
  • Added small logging line for featurizer.core.py since multiple errors came from preprocessing that filtered all data, leaving empty processed data.

@vivwqy vivwqy requested a review from poornimaramesh March 12, 2026 06:24
@vivwqy vivwqy self-assigned this Mar 12, 2026
Copilot AI review requested due to automatic review settings March 12, 2026 06:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds reusable utilities to support synthetic data analysis workflows (Malawi/Togo and future datasets) and improves observability in the CIDER preprocessing pipeline when filtering removes all rows.

Changes:

  • Added notebooks/analysis_helper.py with helper functions for describing data, plotting, preprocessing/featurization, and simple modeling evaluation utilities.
  • Added an info log in preprocess_data to surface when date filtering yields zero rows for a given schema.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/cider/featurizer/core.py Adds logging for empty post-filter datasets inside preprocess_data.
notebooks/analysis_helper.py Introduces notebook-focused helper functions for analysis, plotting, preprocessing/featurization, and k-fold evaluation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vivwqy vivwqy requested a review from poornimaramesh March 17, 2026 06:03
@vivwqy
Copy link
Copy Markdown
Collaborator Author

vivwqy commented Mar 17, 2026

Main reviews:

  1. Updated featurizer core.py:
  • preprocessing and featurize functions
  • add handling and loggings for several cases with missing data/ possible error cases
  1. Updated analysis_helper.py based on reviews.
  2. Updated notebooks synthetic_data_analysis and demo_pipeline based on updated functions.
  3. Added python version for mypy in precommit to be able to handle match-case structure.

Copy link
Copy Markdown
Collaborator

@poornimaramesh poornimaramesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! We can return to this when we start up work again -- I think we should think about how to present / set this up so it can be reused with any arbitrary data.
I'd prefer to leave this as is for now.

Feel free to merge at wiil.

@vivwqy vivwqy merged commit 4f86660 into main Mar 25, 2026
2 checks passed
@vivwqy vivwqy deleted the vw/synthetic-data-analysis branch March 25, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants