-
Notifications
You must be signed in to change notification settings - Fork 13
benchmark_memcpy.py problem #17
Copy link
Copy link
Open
Description
def benchmark_transfer(src_cache, dst_cache, description):
start_time = time.time()
for src, dst in zip(src_cache, dst_cache):
dst[0].copy_(src[0], non_blocking=True)
dst[1].copy_(src[0], non_blocking=True)
torch.cuda.synchronize() # Ensure CUDA operations are synchronized
elapsed = (time.time() - start_time) / NUM_LAYERS
print(f"{description} Average Latency: {elapsed * 1000:.2f} milliseconds")
should the second src[0] be src[1]?
thx for you early reply❤
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels