-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
Description
Large bottlenecks when training. I trained for 12 hours with cProfile running and got this as the top cumulative time functions. It shows that Batch.fill takes up 85% of training time!
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 39087.736 39087.736 Train.py:1(<module>)
1 3.653 3.653 39086.782 39086.782 Train.py:16(main)
145058 1.010 0.000 35739.204 0.246 Train.py:44(run_net)
145058 129.018 0.001 33142.496 0.228 Batch.py:34(fill)
10049055 21.451 0.000 22053.800 0.002 Data.py:39(get_data)
10049055 1976.519 0.000 22032.350 0.002 Segment_Data.py:275(get_data)
64993847 9822.780 0.000 15071.341 0.000 dataset.py:397(__getitem__)
9283661 755.755 0.000 10920.047 0.001 Batch.py:49(data_into_batch)
114014974/112709452 5766.228 0.000 5781.245 0.000 {torch._C.cat}
129980296 2802.223 0.000 4964.670 0.000 group.py:160(__getitem__)
145057 2.188 0.000 3308.168 0.023 Batch.py:111(backward)
65565909 579.216 0.000 3170.871 0.000 _utils.py:37(_cuda)
145057 261.668 0.002 2595.698 0.018 Batch.py:97(forward)
64993847 277.916 0.000 2158.638 0.000 selections.py:27(select)
145057 12.529 0.000 1901.331 0.013 clip_grad.py:2(clip_grad_norm)
7542964 1797.248 0.000 1797.248 0.000 {method 'norm' of 'torch._C.CudaFloatTensorBase' objects}
65420851 1795.412 0.000 1795.412 0.000 {method 'copy_' of 'torch._C.CudaFloatTensorBase' objects}
64993847 1425.956 0.000 1740.737 0.000 dataset.py:313(__init__)
56137097 229.232 0.000 1557.593 0.000 __init__.py:267(type)
259975388 1315.414 0.000 1348.605 0.000 dataset.py:217(shape)
64993847 392.916 0.000 1277.835 0.000 selections.py:250(__getitem__)
10009001/290115 39.002 0.000 1142.344 0.004 module.py:205(__call__)
56137099 285.072 0.000 1116.109 0.000 _utils.py:5(_type)
145058 2.763 0.000 1107.541 0.008 SqueezeNet.py:76(forward)
435174 4.181 0.000 1074.084 0.002 container.py:62(forward)
27850982 50.400 0.000 1012.754 0.000 tensor.py:37(float)
1160464 13.646 0.000 965.766 0.001 SqueezeNet.py:25(forward)
64993847 125.658 0.000 907.826 0.000 fromnumeric.py:1837(product)
145057 1.007 0.000 895.441 0.006 variable.py:116(backward)
145057 892.171 0.006 892.171 0.006 {method 'run_backward' of 'torch._C._EngineBase' objects}
74279245 829.433 0.000 829.433 0.000 {method 'reduce' of 'numpy.ufunc' objects}
64993847 288.611 0.000 794.311 0.000 selections.py:429(_handle_simple)
18567321 724.864 0.000 724.864 0.000 {method 'copy_' of 'torch._C.CudaDoubleTensorBase' objects}
1305522 8.819 0.000 622.831 0.000 variable.py:839(cat)
27850944 30.437 0.000 619.597 0.000 tensor.py:29(cpu)
1305522 3.550 0.000 612.082 0.000 tensor.py:308(forward)
64993847 186.837 0.000 539.964 0.000 selections.py:244(__init__)
145057 70.703 0.000 509.208 0.004 adadelta.py:27(step)
64993847 320.490 0.000 325.345 0.000 selections.py:147(__init__)
27850944 311.320 0.000 311.320 0.000 {method 'copy_' of 'torch._C.FloatTensorBase' objects}
1281151179/1281151177 261.754 0.000 288.256 0.000 {isinstance}
64993847 249.225 0.000 281.225 0.000 filters.py:207(get_filters)
9283661 279.629 0.000 279.629 0.000 {method 'copy_' of 'torch._C.CudaByteTensorBase' objects}
64993847 177.023 0.000 269.627 0.000 selections.py:406(_expand_ellipsis)
64993847 268.112 0.000 268.112 0.000 base.py:81(is_empty_dataspace)
27850981 20.463 0.000 259.458 0.000 tensor.py:312(__div__)
27850981 238.995 0.000 238.995 0.000 {method 'div' of 'torch._C.CudaFloatTensorBase' objects}
18567374 220.069 0.000 220.069 0.000 {method 'zero_' of 'torch._C.FloatTensorBase' objects}
9283648 6.272 0.000 208.434 0.000 {method 'mean' of 'numpy.ndarray' objects}
18567309 13.199 0.000 205.147 0.000 tensor.py:273(__sub__)
3771508 11.772 0.000 202.183 0.000 conv.py:235(forward)
9283648 101.293 0.000 202.162 0.000 _methods.py:53(_mean)
64986860 159.096 0.000 192.368 0.000 group.py:36(__init__)
18567309 191.948 0.000 191.948 0.000 {method 'sub' of 'torch._C.CudaFloatTensorBase' objects}
Reactions are currently unavailable