-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Opening an issue to discuss API design around a requirement where independent, yet random-state-fixed copies of an estimator need to be obtained.
An example would be the bootstrap clones discussed here: sktime/sktime#5823 - these should be statistically independent pseudo-random.
Currently, clone copies the random_seed 1:1, which results in:
- if
random_seed=None, results in independent copies - but not pseudo-random fixed (each run gives different values) - if
random_seedis set, results in value-identical copies, not statistically independent pseudo-random copies - but pseudo-random fixed copies
Neither meets the requirement above, because that would ned to be both pseudo-random fixed, and statistically independent (not value-identical).
In light of the rework of random_seed functionality (see #268), it is worth a discussion how this should even look like from the API perspective.
A key problem arises if multiple clones are needed - it needs to be known in advance, or at least they need to be sampled in a chain, to obtain dependent seeds which give rise to pseudo-random independent copies.
Further, we cannot change the default behaviour of clone and its current parameters, as it is an interface point of high importance.
Options I can think of:
- sth like
clone(deep=True, random_seed="exact_copy", n_clones=None) - a new method
clone_random(deep=True, n_clones=1)
FYI @ericjb, @jmwhyte, @tpvasconcelos - since we all discussed either clone or random_seed recently.