Training bottleneck

Large bottlenecks when training. I trained for 12 hours with cProfile running and got this as the top cumulative time functions. It shows that **Batch.fill takes up 85% of training time!**

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002 39087.736 39087.736 Train.py:1(<module>)
        1    3.653    3.653 39086.782 39086.782 Train.py:16(main)
   145058    1.010    0.000 35739.204    0.246 Train.py:44(run_net)
   145058  129.018    0.001 33142.496    0.228 Batch.py:34(fill)
 10049055   21.451    0.000 22053.800    0.002 Data.py:39(get_data)
 10049055 1976.519    0.000 22032.350    0.002 Segment_Data.py:275(get_data)
 64993847 9822.780    0.000 15071.341    0.000 dataset.py:397(__getitem__)
  9283661  755.755    0.000 10920.047    0.001 Batch.py:49(data_into_batch)
114014974/112709452 5766.228    0.000 5781.245    0.000 {torch._C.cat}
129980296 2802.223    0.000 4964.670    0.000 group.py:160(__getitem__)
   145057    2.188    0.000 3308.168    0.023 Batch.py:111(backward)
 65565909  579.216    0.000 3170.871    0.000 _utils.py:37(_cuda)
   145057  261.668    0.002 2595.698    0.018 Batch.py:97(forward)
 64993847  277.916    0.000 2158.638    0.000 selections.py:27(select)
   145057   12.529    0.000 1901.331    0.013 clip_grad.py:2(clip_grad_norm)
  7542964 1797.248    0.000 1797.248    0.000 {method 'norm' of 'torch._C.CudaFloatTensorBase' objects}
 65420851 1795.412    0.000 1795.412    0.000 {method 'copy_' of 'torch._C.CudaFloatTensorBase' objects}
 64993847 1425.956    0.000 1740.737    0.000 dataset.py:313(__init__)
 56137097  229.232    0.000 1557.593    0.000 __init__.py:267(type)
259975388 1315.414    0.000 1348.605    0.000 dataset.py:217(shape)
 64993847  392.916    0.000 1277.835    0.000 selections.py:250(__getitem__)
10009001/290115   39.002    0.000 1142.344    0.004 module.py:205(__call__)
 56137099  285.072    0.000 1116.109    0.000 _utils.py:5(_type)
   145058    2.763    0.000 1107.541    0.008 SqueezeNet.py:76(forward)
   435174    4.181    0.000 1074.084    0.002 container.py:62(forward)
 27850982   50.400    0.000 1012.754    0.000 tensor.py:37(float)
  1160464   13.646    0.000  965.766    0.001 SqueezeNet.py:25(forward)
 64993847  125.658    0.000  907.826    0.000 fromnumeric.py:1837(product)
   145057    1.007    0.000  895.441    0.006 variable.py:116(backward)
   145057  892.171    0.006  892.171    0.006 {method 'run_backward' of 'torch._C._EngineBase' objects}
 74279245  829.433    0.000  829.433    0.000 {method 'reduce' of 'numpy.ufunc' objects}
 64993847  288.611    0.000  794.311    0.000 selections.py:429(_handle_simple)
 18567321  724.864    0.000  724.864    0.000 {method 'copy_' of 'torch._C.CudaDoubleTensorBase' objects}
  1305522    8.819    0.000  622.831    0.000 variable.py:839(cat)
 27850944   30.437    0.000  619.597    0.000 tensor.py:29(cpu)
  1305522    3.550    0.000  612.082    0.000 tensor.py:308(forward)
 64993847  186.837    0.000  539.964    0.000 selections.py:244(__init__)
   145057   70.703    0.000  509.208    0.004 adadelta.py:27(step)
 64993847  320.490    0.000  325.345    0.000 selections.py:147(__init__)
 27850944  311.320    0.000  311.320    0.000 {method 'copy_' of 'torch._C.FloatTensorBase' objects}
1281151179/1281151177  261.754    0.000  288.256    0.000 {isinstance}
 64993847  249.225    0.000  281.225    0.000 filters.py:207(get_filters)
  9283661  279.629    0.000  279.629    0.000 {method 'copy_' of 'torch._C.CudaByteTensorBase' objects}
 64993847  177.023    0.000  269.627    0.000 selections.py:406(_expand_ellipsis)
 64993847  268.112    0.000  268.112    0.000 base.py:81(is_empty_dataspace)
 27850981   20.463    0.000  259.458    0.000 tensor.py:312(__div__)
 27850981  238.995    0.000  238.995    0.000 {method 'div' of 'torch._C.CudaFloatTensorBase' objects}
 18567374  220.069    0.000  220.069    0.000 {method 'zero_' of 'torch._C.FloatTensorBase' objects}
  9283648    6.272    0.000  208.434    0.000 {method 'mean' of 'numpy.ndarray' objects}
 18567309   13.199    0.000  205.147    0.000 tensor.py:273(__sub__)
  3771508   11.772    0.000  202.183    0.000 conv.py:235(forward)
  9283648  101.293    0.000  202.162    0.000 _methods.py:53(_mean)
 64986860  159.096    0.000  192.368    0.000 group.py:36(__init__)
 18567309  191.948    0.000  191.948    0.000 {method 'sub' of 'torch._C.CudaFloatTensorBase' objects}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training bottleneck #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training bottleneck #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions