Skip to content

FeatureKit upgrade#7

Merged
terbed merged 9 commits intomainfrom
feature_upgrade
Aug 13, 2025
Merged

FeatureKit upgrade#7
terbed merged 9 commits intomainfrom
feature_upgrade

Conversation

@terbed
Copy link
Collaborator

@terbed terbed commented Aug 13, 2025

Summary

This pull request introduces significant improvements to the feature engineering pipeline in FinMLKit, focusing on enhanced composability, efficiency, and extensibility. The main highlights are the addition of the ExternalFunction transform for integrating third-party libraries, improved caching and short-circuiting in mathematical operations, and new documentation and tutorials to support these capabilities.

Feature pipeline enhancements:

  • Added ExternalFunction transform to allow wrapping external Python callables (including by import path) as pipeline steps, with full support for serialization, multiple outputs, and NumPy/pandas compatibility. This enables seamless integration of third-party libraries like TA-Lib into feature pipelines.
  • Improved mathematical operation transforms (AddOp, ScalarOp, UnaryOp, MinMaxOp) to support output caching and short-circuiting: if a required output column is already present in the DataFrame, computation is skipped and the cached column is reused. This optimizes performance in complex, dependency-rich pipelines. [1] [2] [3] [4]
  • All mathematical operation transforms now store their op_name for better introspection and debugging. [1] [2] [3] [4]
  • JSON Serialization for FeatureKit to support FeatureKit configuration export and import to improve trancparency and reproducibility.
  • Build a computational graph and topological execution to improve efficacy.

Documentation and tutorials:

  • Added a comprehensive tutorial (feature_pipelines.rst) covering Compose, FeatureKit, the computation graph, caching, reproducibility, and integration with external libraries using ExternalFunction.
  • Registered the new tutorial in the documentation index for discoverability.
  • Expanded the README.md to document new capabilities: computational graph, optimized/caching-aware execution, reproducibility via JSON serialization, and external library integration.

Other improvements:

  • Fixed naming in the SMA transform to use output_name for consistency.
  • Updated type imports in transforms.py for clarity and future extensibility.

These changes make it easier to build, debug, and extend sophisticated feature pipelines, while ensuring efficient execution and reproducibility.

Related Issues

  • Closes #

Testing

  • NUMBA_DISABLE_JIT=1 pytest -q
  • numba enabled tests

Documentation

  • Added/updated docstrings
  • Updated README or other docs

Checklist

  • Tests added or updated
  • Documentation added or updated
  • Linting and type checks pass

Copilot AI review requested due to automatic review settings August 13, 2025 17:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR significantly enhances the FeatureKit framework in FinMLKit with major improvements to feature engineering capabilities, focusing on composability, performance optimization, and reproducibility. The key enhancements include adding support for external library integration, implementing caching-aware execution pipelines, and introducing comprehensive serialization capabilities.

Key changes:

  • Added ExternalFunction transform for seamless integration of third-party libraries (e.g., TA-Lib, NumPy) with full serialization support
  • Implemented intelligent caching and short-circuiting in mathematical operation transforms to optimize performance in complex pipelines
  • Added JSON serialization/deserialization for Features and FeatureKit with complete configuration export/import capabilities

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
finmlkit/feature/transforms.py Added ExternalFunction transform class for external library integration
finmlkit/feature/kit.py Major expansion with serialization framework, computation graph, and enhanced FeatureKit capabilities
finmlkit/feature/base.py Enhanced mathematical operation transforms with caching and op_name storage
tests/features/test_*.py Comprehensive test suite for new serialization, external functions, and caching features
docs/source/tutorials/feature_pipelines.rst New tutorial covering advanced FeatureKit capabilities
examples/QuickStartGuide.ipynb Updated with demonstrations of new features and capabilities
README.md Updated to document new computational graph, caching, and external integration features

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

terbed and others added 3 commits August 13, 2025 19:37
remove unused import statement

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…_ops.py


Remove unused import statement

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@terbed terbed merged commit ba5a9c5 into main Aug 13, 2025
4 checks passed
@terbed terbed deleted the feature_upgrade branch August 13, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants