Skip to content

[ENH] handling of random_state in clone #279

@fkiraly

Description

@fkiraly

Opening an issue to discuss API design around a requirement where independent, yet random-state-fixed copies of an estimator need to be obtained.

An example would be the bootstrap clones discussed here: sktime/sktime#5823 - these should be statistically independent pseudo-random.

Currently, clone copies the random_seed 1:1, which results in:

  • if random_seed=None, results in independent copies - but not pseudo-random fixed (each run gives different values)
  • if random_seed is set, results in value-identical copies, not statistically independent pseudo-random copies - but pseudo-random fixed copies

Neither meets the requirement above, because that would ned to be both pseudo-random fixed, and statistically independent (not value-identical).

In light of the rework of random_seed functionality (see #268), it is worth a discussion how this should even look like from the API perspective.

A key problem arises if multiple clones are needed - it needs to be known in advance, or at least they need to be sampled in a chain, to obtain dependent seeds which give rise to pseudo-random independent copies.

Further, we cannot change the default behaviour of clone and its current parameters, as it is an interface point of high importance.

Options I can think of:

  • sth like clone(deep=True, random_seed="exact_copy", n_clones=None)
  • a new method clone_random(deep=True, n_clones=1)

FYI @ericjb, @jmwhyte, @tpvasconcelos - since we all discussed either clone or random_seed recently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API designAPI design & software architecture

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions