Can you please suggest how to distribute the model during inference? I did not manager to load a model on a 40GB GPU