-
Notifications
You must be signed in to change notification settings - Fork 90
FusedAdam requires cuda extensions #11
Copy link
Copy link
Open
Description
I have built the apex module based on the procedure explained but when trying to train the model on cifar10, I get:
/lustre03/project/6054857/mehranag/vdvae/data.py:147: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
trX = np.vstack(data['data'] for data in tr_data)
Traceback (most recent call last):
File "train.py", line 144, in <module>
main()
File "train.py", line 140, in main
train_loop(H, data_train, data_valid_or_test, preprocess_fn, vae, ema_vae, logprint)
File "train.py", line 59, in train_loop
optimizer, scheduler, cur_eval_loss, iterate, starting_epoch = load_opt(H, vae, logprint)
File "/lustre03/project/6054857/mehranag/vdvae/train_helpers.py", line 180, in load_opt
optimizer = AdamW(vae.parameters(), weight_decay=H.wd, lr=H.lr, betas=(H.adam_beta1, H.adam_beta2))
File "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", line 79, in __init__
raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions')
RuntimeError: apex.optimizers.FusedAdam requires cuda extensions
I understand that this is an apex-related issue since I get the following error when trying to run examples/simple/distributed in the apex repo:
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ImportError("/lib64/libm.so.6: version `GLIBC_2.29' not found (required by /home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/amp_C.cpython-36m-x86_64-linux-gnu.so)",)
final loss = tensor(0.5392, device='cuda:0', grad_fn=<MseLossBackward>)
I have tried many things to fix this issue but no luck. I have two questions:
- Does anybody know why I get
FusedAdam requires cuda extensionseven though I build apex with--global-option="--cpp_ext" --global-option="--cuda_ext"options? - How can I avoid using apex? - I am only trying to test some stuff on cifar10 and don't need the distributed training feature considering that I'm getting some weird errors!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels