Data Loader speeding up methods

I have sufficient memory while not many CPU cores on my server, therefore IO can be the bottleneck of the training trials.

I noticed that you have set `pin_memory=False` in PyTorch DataLoader, and I didn't see any change of run time from toggling it.

Since this experiment is quite IO heavy, have you tried any speeding up method?