add MAPE to regression metrics (fixes #691) by jameslamb · Pull Request #822 · dask/dask-ml

jameslamb · 2021-04-06T19:59:37Z

This PR proposes adding mean_absolute_percentage_error() ("MAPE"), as originally suggested in #691.

It follows the implementation from scikit-learn (https://github.com/scikit-learn/scikit-learn/blob/9cfacf1540a991461b91617c779c69753a1ee4c0/sklearn/metrics/_regression.py#L280), including the use of np.finfo(np.float64).eps in the denominator to prevent divide-by-0 errors.

Notes for reviewers

This PR adds a bit of test coverage by adding mean_absolute_percentage_error() to the metric_pairs fixture in tests. It would automatically get more specific coverage (like for combinations of multioutput and compute) if #820 is accepted.

Thanks for your time and consideration.

hristog · 2021-04-08T11:15:20Z

tests/metrics/test_regression.py



-@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "r2_score"])
+@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])


Suggested change

@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])

@pytest.fixture(

params=[

"mean_squared_error",

"mean_absolute_error",

"mean_absolute_percentage_error",

"r2_score",

]

)

Looks like black==19.10b0 isn't happy about the line length here.

Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.

Looks like black==19.10b0 isn't happy about the line length here.

Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable, since

dask-ml/setup.py

Line 29 in f5e5bb4

"black",

doesn't have a pin for things like black.

Once i switched to the conda instructions there, I got the expected diff. Updated in 1142fcc.

Also, perhaps, it'd be nice if the correctness of the method was sanity-checked against its sklearn counterpart, just as it's done for some of the other metrics a bit further down in the same test file.

Can you clarify what you want me to change? As far as I can tell, that is exactly what happens by adding mean_squared_percentage_error to the metric_pairs fixture. Every metric in that fixture is tested against its scikit-learn equivalent by

dask-ml/tests/metrics/test_regression.py

Line 37 in f5e5bb4

assert abs(result - expected) < 1e-5

.

Ah ok. If dask-ml has chosen to pin to older versions of linters, then I think the non-conda option documented at https://ml.dask.org/contributing.html#style will be unreliable

You're absolutely right! I've got a PR over at #813 waiting to be reviewed (for a couple of weeks now), and subsequently merged. It should improve the static-checking situation.

Every metric in that fixture is tested against its scikit-learn equivalent by

Indeed - ignore me about this one, please! I got confused that we should probably further introduce extra tests like the test_mean_squared_log_error one.

I'll bring up the question of whether the setup.py versions of the linters should be pinned, too, in #813.

jameslamb · 2021-04-09T16:50:49Z

I'm grateful for the linux earliest CI job, because it caught an issue with this PR. The original changes I proposed would have made dask-ml incompatible with scikit-learn < 0.24.

I just pushed b05213b to attempt to address it.

Basically, the use of @derived_from(sklearn.metrics) for the version of dask used in the "earliest" environment is incompatible with scikit-learn v0.23.x, and results in this error:

ImportError while loading conftest '/home/jlamb/repos/open-source/dask-ml/tests/conftest.py'.
tests/conftest.py:9: in <module>
    from dask_ml.datasets import (
dask_ml/__init__.py:4: in <module>
    from dask_ml.model_selection import _normalize
dask_ml/model_selection/__init__.py:6: in <module>
    from ._hyperband import HyperbandSearchCV
dask_ml/model_selection/_hyperband.py:12: in <module>
    from ._incremental import BaseIncrementalSearchCV
dask_ml/model_selection/_incremental.py:32: in <module>
    from ..wrappers import ParallelPostFit
dask_ml/wrappers.py:16: in <module>
    from .metrics import check_scoring, get_scorer
dask_ml/metrics/__init__.py:7: in <module>
    from .regression import (  # noqa
dask_ml/metrics/regression.py:91: in <module>
    ) -> ArrayLike:
../../../miniconda3/envs/dask-ml-earliest/lib/python3.6/site-packages/dask/utils.py:656: in wrapper
    module_name = original_klass.__module__.split(".")[0]
E   AttributeError: module 'sklearn.metrics' has no attribute '__module__'

@derived_from() is only used to inherit documentation, not any functionality, so I copied over the relevant scikit-learn docstring (https://github.com/scikit-learn/scikit-learn/blob/da3c2d2a19ade5ca69adb6952ecace811ed122ff/sklearn/metrics/_regression.py#L283). I'm assuming this is ok since scikit-learn uses BSD 3-clause and since this project bundles the scikit-learn license: https://github.com/dask/dask-ml/blob/main/licenses/scikit-learn/COPYING.

TomAugspurger · 2021-04-10T16:27:56Z

Looks good, thanks!

add MAPE to regression metrics (fixes #691)

49b9eb5

hristog suggested changes Apr 8, 2021

View reviewed changes

linting

1142fcc

hristog mentioned this pull request Apr 9, 2021

Pin static-checking versions and add pre-commit Action #813

Open

hristog approved these changes Apr 9, 2021

View reviewed changes

fix compatibility with older scikit-learn

b05213b

linting

99415c7

TomAugspurger merged commit 27d8d37 into dask:main Apr 10, 2021

jameslamb deleted the feat/mape branch April 10, 2021 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add MAPE to regression metrics (fixes #691)#822

add MAPE to regression metrics (fixes #691)#822
TomAugspurger merged 4 commits intodask:mainfrom
jameslamb:feat/mape

jameslamb commented Apr 6, 2021

Uh oh!

hristog Apr 8, 2021 •

edited

Loading

Uh oh!

hristog Apr 8, 2021 •

edited

Loading

Uh oh!

jameslamb Apr 9, 2021

Uh oh!

hristog Apr 9, 2021 •

edited

Loading

Uh oh!

hristog Apr 9, 2021

Uh oh!

jameslamb commented Apr 9, 2021

Uh oh!

TomAugspurger commented Apr 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "r2_score"])
		@pytest.fixture(params=["mean_squared_error", "mean_absolute_error", "mean_absolute_percentage_error", "r2_score"])

Uh oh!

Conversation

jameslamb commented Apr 6, 2021

Notes for reviewers

Uh oh!

hristog Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hristog Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameslamb Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

hristog Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hristog Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

jameslamb commented Apr 9, 2021

Uh oh!

TomAugspurger commented Apr 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hristog Apr 8, 2021 •

edited

Loading

hristog Apr 8, 2021 •

edited

Loading

hristog Apr 9, 2021 •

edited

Loading