Skip to content

Changes to optional columns, and corresponding featurizing functions#69

Merged
poornimaramesh merged 7 commits intomainfrom
mobile_aid_kenya
Feb 25, 2026
Merged

Changes to optional columns, and corresponding featurizing functions#69
poornimaramesh merged 7 commits intomainfrom
mobile_aid_kenya

Conversation

@poornimaramesh
Copy link
Copy Markdown
Collaborator

Reviewer:
Estimate:


Ticket

Fixes: Issue 63

Description

Update optional and required columns, as part of the fixes for the MobileAid program

Changes

1️. Updates for featurizing CDR data
a. Update expected raw CDR schema for MobileAid Kenya.
Required input columns:
caller_id
recipient_id
timestamp
duration
caller_antenna_id
transaction_type
b. Make featurizer functions referencing caller_antenna_id / recipient_antenna_id optional (since it may not exist in raw data).

2️. Update featurizing mobile money data
a. Update expected raw mobile money schema for MobileAid Kenya.
Required input columns:
timestamp
caller_id
recipient_id
amount
transaction_type
b. Make featurizer functions referencing caller / recipient balance optional (since it may not exist in raw data)

  1. Update corresponding synthetic data generation functions, tests, and demo pipeline notebook

How has this been tested?

I've added a new variable to the demo pipeline notebook while generating synthetic data: keep_optional_columns.
If we set this to False, then the generated data keeps only the expected columns for CDR and mobile money (as defined above)
Then subsequent preprocessing and featurization steps should automatically adapt to the reduced number of columns (i.e. they should run without errors). At the end of the featurization step (step 4), with keep_optional_columns = False, there should be 741 columns. With keep_optional_columns=True, there should be 831 columns.

All run make tests and ensure they pass.

Checklist

Fill with x for completed.

  • I have run pre-commit hooks locally
  • I have resolved merge conflicts
  • I have updated the automated tests (if applicable)
  • I have updated affected documentation (if applicable)

Copilot AI review requested due to automatic review settings February 25, 2026 09:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CIDER featurizer to make certain columns optional, specifically for the MobileAid Kenya deployment. The changes enable the system to work with CDR and mobile money data that may lack antenna location information and balance data, which are not available in all data sources.

Changes:

  • Updated CDR and mobile money schemas to make antenna IDs and balance fields optional
  • Modified featurizer functions to conditionally compute features based on available columns
  • Enhanced synthetic data generation to support optional columns via keep_optional_columns parameter
  • Added docstrings to schema classes and enums for improved documentation

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
src/cider/schemas.py Changed Pydantic config from v2 to v1 style; made balance fields optional in MobileMoneyTransactionData; made recipient_id required; added docstrings
src/cider/featurizer/schemas.py Changed Pydantic config from v2 to v1 style; made balance fields optional in MobileMoneyDataWithDirection
src/cider/featurizer/dependencies.py Updated swap_caller_and_recipient and identify_mobile_money_transaction_direction to handle missing antenna and balance columns
src/cider/featurizer/core.py Added conditional logic to skip antenna-based and balance-based features when columns are missing; added dropna calls before processing optional columns; added check for missing AntennaData
src/cider/utils.py Added check_optional_columns parameter to validate_dataframe; updated _get_column_types to handle optional fields; updated synthetic data functions to support keep_optional_columns parameter
tests/test_utils.py Parameterized tests to cover both keep_optional_columns=True and False scenarios
.pre-commit-config.yaml Added --ignore-nested-classes flag to interrogate to accommodate nested Config classes
Comments suppressed due to low confidence (1)

src/cider/featurizer/core.py:2638

  • The featurize_cdr_data function now handles missing AntennaData by checking if it's in preprocessed_data (line 2638). However, featurize_all_data still assumes all other data types (MobileDataUsageData, MobileMoneyTransactionData, RechargeData) are always present. The preprocess_data function can skip schemas that are not in the input data_dict (see lines 2373-2379), which could cause KeyError exceptions here if those schemas are missing. Consider adding similar checks for the other schema types, or ensuring all required schemas are present before calling this function.
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@poornimaramesh poornimaramesh merged commit 8f5f939 into main Feb 25, 2026
1 of 2 checks passed
@poornimaramesh poornimaramesh deleted the mobile_aid_kenya branch February 25, 2026 11:02
@poornimaramesh poornimaramesh restored the mobile_aid_kenya branch February 25, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TASK] Move featurizer module code into a single script on a separate branch

2 participants