Conversation
| nvidia-ml-py | ||
| protobuf<=3.20.3 | ||
| tensorboard==2.10.1 | ||
| tensorboard==1.15 |
There was a problem hiding this comment.
hold up, this can't be right, this is a tensorboard version for tensorflow v1 from 2019.
can we have a version that is more modern?
There was a problem hiding this comment.
We can test a different version but 2.10.1 fails on end-to-end testing and was imortalized in jira PAD-91:
https://hpe-aiatscale.atlassian.net/browse/PAD-91
103_run_mlde_validation_suite_against_rocm_on_grenoble/determined-...
tests/nightly/test_pytorch2.py::test_pytorch2_hf_language_modeling_distributed FAILED [100%]
The reason for the failure is:
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above
There was a problem hiding this comment.
I have a strong opinion we cannot pin 1.15 here because it's 5 years old and likely to conflict with other dependencies and have CVEs.
2.10.1 is also technically above 1.15...
VERSION
Outdated
| @@ -1 +1 @@ | |||
| 0.30.1 | |||
| 0.31.1 | |||
There was a problem hiding this comment.
we've just released 0.30.0; if it'll land (and get into bumpenvs) by EOD today, you can keep it at 0.30.1. otherwise it'll probably be 0.30.2
Tensorboard and GPU kernel build fixes (PAD-91 and PAD-134)
Checklist
bumpenvsprocedure in the determined repo. See README.