Skip to content

[feat]: add Categorical BDPO#25

Open
cmj2002 wants to merge 10 commits intomasterfrom
categorical_bdpo
Open

[feat]: add Categorical BDPO#25
cmj2002 wants to merge 10 commits intomasterfrom
categorical_bdpo

Conversation

@cmj2002
Copy link
Collaborator

@cmj2002 cmj2002 commented Feb 22, 2026

No description provided.

@cmj2002 cmj2002 requested a review from typoverflow February 22, 2026 14:40
@cmj2002 cmj2002 changed the title [feat]: add Categorical bdpo [feat]: add Categorical BDPO Feb 22, 2026
@typoverflow typoverflow requested a review from Copilot February 22, 2026 18:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Categorical BDPO (Behavior-Regularized Diffusion Policy Optimization) variant to the codebase, implementing a distributional version of the BDPO algorithm using categorical value distributions (C51-style) with symlog transformation. The implementation follows the structure of the existing BDPO agent but replaces scalar Q-values with categorical distributions for the temporal value function (vt).

Changes:

  • Added CategoricalBDPOAgent with categorical value distributions for the temporal critic (vt) while keeping scalar values for q0
  • Introduced CategoricalCriticWithDiscreteTime module for time-conditioned categorical value prediction
  • Added configuration files and integration into the existing training pipeline

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
flowrl/agent/offline/bdpo/categorical_bdpo.py New agent implementation with categorical value distributions, symlog transformation, and C51-style projection
flowrl/module/critic.py Added CategoricalCriticWithDiscreteTime module for time-conditioned categorical predictions
flowrl/config/offline/algo/categorical_bdpo.py Configuration dataclasses for the new algorithm
flowrl/config/offline/__init__.py Registered CategoricalBDPOConfig in the config registry
flowrl/agent/offline/__init__.py Exported CategoricalBDPOAgent for use in training scripts
examples/offline/main_d4rl.py Added categorical_bdpo to supported agents
examples/offline/config/d4rl/algo/categorical_bdpo.yaml Default hyperparameters for D4RL tasks
scripts/d4rl/categorical_bdpo.sh Experiment script with task-specific hyperparameters for D4RL benchmarks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cmj2002 and others added 8 commits February 23, 2026 17:19
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants