Skip to content

Non-Deterministic results PreProcess.reduce(extend=True) due to internal MCMC sampling #136

@PalashMendhe

Description

@PalashMendhe

Describe the bug
When running PreProcess.reduce(extend=True) on the same model multiple times, the function returns different sets of removed reactions. This non-determinism prevents reproducibility for downstream analysis.

To Reproduce
Steps to reproduce the behavior:

  1. Go to your root directory.
  2. Create a test file.
  3. here is a minimal script using e_coli_core.json to recreate the error
import numpy as np
import cobra.io
from dingo.preprocess import PreProcess

def test_extend_nondeterminism():
    ecoli = cobra.io.load_json_model("ext_data/e_coli_core.json")
    np.random.seed(42) 
    preprocessor_A = PreProcess(ecoli.copy(), tol=1e-6, verbose=False)
    removed_A, _ = preprocessor_A.reduce(extend=True)
    
    np.random.seed(42) 
    preprocessor_B = PreProcess(ecoli.copy(), tol=1e-6, verbose=False)
    removed_B, _ = preprocessor_B.reduce(extend=True)

    set_A, set_B = frozenset(removed_A), frozenset(removed_B)
    if set_A != set_B:
        print(f"Reactions removed ONLY in Run A: {sorted(set_A - set_B)}")
        print(f"Reactions removed ONLY in Run B: {sorted(set_B - set_A)}")
if __name__ == "__main__":
    test_extend_nondeterminism()

Expected behavior
Given the same metabolic model and the same algorithm parameters, reduce(extend=True) should always return the same set of removed reactions.

Screenshots

Image Image

Proposed fix
Add an optional steady_states parameter to reduce().
When steady_states is provided by the caller, the internal sampling step is skipped entirely and the provided matrix is used directly for correlation estimation. When steady_states=None (the default), the current internal sampling behaviour is preserved as a fallback for convenience, but a UserWarning is emitted so the user is informed that results will not be reproducible.

Desktop

  • OS: WSL / Ubuntu
  • Browser : chrome

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions