benchmark_memcpy.py problem

```
def benchmark_transfer(src_cache, dst_cache, description):
    start_time = time.time()
    for src, dst in zip(src_cache, dst_cache):
        dst[0].copy_(src[0], non_blocking=True)
        dst[1].copy_(src[0], non_blocking=True)
    torch.cuda.synchronize()  # Ensure CUDA operations are synchronized
    elapsed = (time.time() - start_time) / NUM_LAYERS
    print(f"{description} Average Latency: {elapsed * 1000:.2f} milliseconds")
```
should the second src[0] be src[1]?
thx for you early reply❤

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark_memcpy.py problem #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

benchmark_memcpy.py problem #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions