Still 'CUDA out of memory' with 4 TITAN X (pascal) when training model in PASCAL VOC dataset

Hey, thx for your codes released! 

OS: Ubuntu 16.04
CUDA: 8.0.44
GPU: TITAN X Pascal (11.2GB as the memory) X 4 

I intend to train model in PASCAL VOC 2012, and I run `CUDA_VISIBLE_DEVICES=0,1,2,3 python train_autodeeplab.py --backbone resnet --lr 0.007 --workers 4 --epochs 40 --batch_size 1 --eval_interval 1 --dataset pascal`

and the error message show as below 

Namespace(arch_lr=0.003, arch_weight_decay=0.001, backbone='resnet', base_size=320, batch_size=1, checkname='deeplab-resnet', crop_size=320, cuda=True, dataset='pascal', epochs=40, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=0, loss_type='ce', lr=0.007, lr_scheduler='cos', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resize=512, resume=None, seed=1, start_epoch=0, sync_bn=True, test_batch_size=1, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)Number of images in train: 1464Number of images in val: 1449cuda finished
Using cos LR Scheduler!
Starting Epoch: 0 Total Epoches: 40  0%|                                                                        | 0/1464 [00:00<?, ?it/s]=>Epoches 0, learning rate = 0.0070,                 previous best = 0.0000
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.  warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.  "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):  
File "train_autodeeplab.py", line 324, in <module>    main()  
File "train_autodeeplab.py", line 317, in main    trainer.training(epoch)  File "train_autodeeplab.py", line 116, in training    output = self.model(image)  
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__    result = self.forward(*input, **kwargs)  
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/auto_deeplab.py", line 214, in forward    level4_new_2 = self.cells[count] (self.level_4[-2], self.level_8[-1], weight_cells)  
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__    result = self.forward(*input, **kwargs)  
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in forward    s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)  
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in <genexpr>    s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)  
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__    result = self.forward(*input, **kwargs)  
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in forward    return sum(w * op(x) for w, op in zip(weights, self._ops))  
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in <genexpr>    return sum(w * op(x) for w, op in zip(weights, self._ops))
RuntimeError: CUDA error: out of memory

I guess I may fail to use multi-GPUs, so I even change a line the code into `self.model = torch.nn.DataParallel(self.model, device_ids=[0, 1, 2, 3])`, but the same error message show again. 

What can I do to resolve it, please 

Thx in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Still 'CUDA out of memory' with 4 TITAN X (pascal) when training model in PASCAL VOC dataset #61

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Still 'CUDA out of memory' with 4 TITAN X (pascal) when training model in PASCAL VOC dataset #61

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions