fix: don't unmarshal unnecessary items on batch lookup#1943
fix: don't unmarshal unnecessary items on batch lookup#1943Arpit529Srivastava wants to merge 1 commit intocilium:mainfrom
Conversation
Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>
|
@ti-mo PTAL, let me know if you need any changes. |
|
@ti-mo a gentle ping, thanks |
dylandreimerink
left a comment
There was a problem hiding this comment.
Sorry for taking so long to review this PR, our review queues are a bit on the long side at the moment.
To me the code seems correct. But I question if it makes sense to have it.
In the end this is an optimization, and any optimization should be supported by a benchmark to show the before and after. The issue linked to this comment #1078 (comment). When I run that benchmark locally before and after I essentially don't see any change.
Now, to my understanding this is because our existing iteration benchmarks use the zero copy path which essentially does not benefit from this optimization.
So, this potentially only optimizes the non-zero-copy path. And only when on the last batch read from a map if the map size is not a multiple of the batch size. A very specific set of circumstances. Which begs the question if adding all of this code is worth the gain?
So I think we should have a benchmark at the very least to show the improvement.
In addition, I ran tests locally with test coverage, and it seems like the non-zero-copy case is not covered by existing tests. So I would also like to see a test added or exciting test modified to cover non-zero-copy batch lookups to assert this new logic works and keeps working.
|
hi @dylandreimerink, thanks for the detailed review. yeah that makes sense , i’ll add a benchmark that focuses on the non-zero-copy case with partial batches, using since I'm on macOS, I can't run the benchmarks locally. would you be able to run them, once i pushes the code? |
Yes of course, no problem. |
previously,
BatchLookupandBatchLookupAndDeleteunmarshaled the full key/value buffers based on the requested batch size, even when the kernel returned fewer items. this caused wasted cpu cycles and extra memory allocations for empty entries.this change updates
batchLookupCmd,batchLookupandbatchLookupPerCPUto use the actual count returned by the kernel and slice the buffers accordingly before unmarshaling, so only valid results are processed.fixes: #1080