Skip to content

[BigDL2.0] autoestimator_pytorch hdfs path can not save model on k8s #22

@Le-Zheng

Description

@Le-Zheng

http://10.112.231.51:18888/view/BigDL-2.0-NB/job/BigDL-NB-K8s-ExampleTests/152/console

�[2m�[36m(pid=244, ip=172.30.27.4)�[0m /opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/model/base_pytorch_model.py:180: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   return torch.from_numpy(inp)
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m 
  0%|          | 0/16 [00:00<?, ?it/s]/usr/local/envs/pytf1/lib/python3.7/site-packages/torch/autograd/__init__.py:132: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   allow_unreachable=True)  # allow_unreachable flag
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m 
Loss: 0.6922382116317749:   0%|          | 0/16 [00:00<?, ?it/s]
Loss: 0.4504893720149994:   6%|▋         | 1/16 [00:00<00:00, 50.22it/s]
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m 
Loss: 0.27864789962768555:  12%|█▎        | 2/16 [00:00<00:00, 82.55it/s]
Loss: 0.18915259838104248:  19%|█▉        | 3/16 [00:00<00:00, 106.19it/s]
Loss: 0.112899050116539:  25%|██▌       | 4/16 [00:00<00:00, 124.31it/s]  
Loss: 0.09547075629234314:  31%|███▏      | 5/16 [00:00<00:00, 138.47it/s]
Loss: 0.029641583561897278:  38%|███▊      | 6/16 [00:00<00:00, 150.55it/s]
Loss: 0.056755051016807556:  44%|████▍     | 7/16 [00:00<00:00, 160.61it/s]
Loss: 0.019430123269557953:  50%|█████     | 8/16 [00:00<00:00, 170.19it/s]
Loss: 0.002557608764618635:  56%|█████▋    | 9/16 [00:00<00:00, 178.60it/s]
Loss: 0.004579346626996994:  62%|██████▎   | 10/16 [00:00<00:00, 185.35it/s]
Loss: 0.0019340637372806668:  69%|██████▉   | 11/16 [00:00<00:00, 192.40it/s]
Loss: 0.00223898165859282:  75%|███████▌  | 12/16 [00:00<00:00, 198.61it/s]  
Loss: 0.005255652591586113:  81%|████████▏ | 13/16 [00:00<00:00, 200.80it/s]
Loss: 0.00018203322542831302:  88%|████████▊ | 14/16 [00:00<00:00, 206.26it/s]
Loss: 0.055765699595212936:  94%|█████████▍| 15/16 [00:00<00:00, 212.25it/s]  
Loss: 0.055765699595212936: 100%|██████████| 16/16 [00:00<00:00, 225.74it/s]
�[2m�[36m(pid=245, ip=172.30.27.4)�[0m /opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/model/base_pytorch_model.py:180: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
�[2m�[36m(pid=245, ip=172.30.27.4)�[0m   return torch.from_numpy(inp)
�[2m�[36m(pid=245, ip=172.30.27.4)�[0m 
  0%|          | 0/16 [00:00<?, ?it/s]/usr/local/envs/pytf1/lib/python3.7/site-packages/torch/autograd/__init__.py:132: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
�[2m�[36m(pid=245, ip=172.30.27.4)�[0m   allow_unreachable=True)  # allow_unreachable flag
�[2m�[36m(pid=245, ip=172.30.27.4)�[0m 
Loss: 0.6456587314605713:   0%|          | 0/16 [00:00<?, ?it/s]
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m 2021-11-04 00:35:35,556	ERROR function_runner.py:254 -- Runner Thread raised error.
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m Traceback (most recent call last):
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     self._entrypoint()
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoint
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     self._status_reporter.get_checkpoint())
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     output = fn()
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-orca-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 325, in train_func
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/utils.py", line 72, in put_ckpt_hdfs
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     if remote_ckpt_basename not in get_remote_list(remote_dir):
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/utils.py", line 46, in get_remote_list
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     s_output, _ = process(args)
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m TypeError: cannot unpack non-iterable NoneType object
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m Exception in thread Thread-2:
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m Traceback (most recent call last):
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/threading.py", line 926, in _bootstrap_inner
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     self.run()
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 267, in run
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     raise e
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     self._entrypoint()
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoint
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     self._status_reporter.get_checkpoint())
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     output = fn()
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-orca-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 325, in train_func
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/utils.py", line 72, in put_ckpt_hdfs
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     if remote_ckpt_basename not in get_remote_list(remote_dir):
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m   File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/utils.py", line 46, in get_remote_list
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m     s_output, _ = process(args)
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m TypeError: cannot unpack non-iterable NoneType object
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m 
�[2m�[36m(pid=245, ip=172.30.27.4)�[0m 
Loss: 0.4749995172023773:   6%|▋         | 1/16 [00:00<00:00, 48.86it/s]
Loss: 0.3644247055053711:  12%|█▎        | 2/16 [00:00<00:00, 81.42it/s]
Loss: 0.19700123369693756:  19%|█▉        | 3/16 [00:00<00:00, 105.65it/s]
Loss: 0.15083497762680054:  25%|██▌       | 4/16 [00:00<00:00, 123.93it/s]
Loss: 0.1125955805182457:  31%|███▏      | 5/16 [00:00<00:00, 138.76it/s] 
Loss: 0.07053384184837341:  38%|███▊      | 6/16 [00:00<00:00, 150.92it/s]
Loss: 0.04681260883808136:  44%|████▍     | 7/16 [00:00<00:00, 161.47it/s]
Loss: 0.02035798318684101:  50%|█████     | 8/16 [00:00<00:00, 170.66it/s]
Loss: 0.012909774668514729:  56%|█████▋    | 9/16 [00:00<00:00, 178.95it/s]
Loss: 0.0078040556982159615:  62%|██████▎   | 10/16 [00:00<00:00, 186.17it/s]
Loss: 0.04752806946635246:  69%|██████▉   | 11/16 [00:00<00:00, 192.78it/s]  
Loss: 0.019220085814595222:  75%|███████▌  | 12/16 [00:00<00:00, 198.82it/s]
Loss: 0.010350744239985943:  81%|████████▏ | 13/16 [00:00<00:00, 200.81it/s]
Loss: 0.0005109629710204899:  88%|████████▊ | 14/16 [00:00<00:00, 206.25it/s]
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m 
�[2m�[36m(pid=244, ip=172.30.27.4)�[0m /bin/sh: hdfs: command not found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions