Conversation
Claude review of PR #470 (d19f0fc)Must fix
Suggestions
Nits
CLAUDE.md improvements
Resolved from prior review
|
314af18 to
02938a3
Compare
qdeslandes
left a comment
There was a problem hiding this comment.
Alright, first pass of review and a few things to fix. I'll do a second, deeper pass, when those are solved. That being said, it's a very welcome addition! :D
02938a3 to
1aae907
Compare
1aae907 to
550c254
Compare
550c254 to
d6883e1
Compare
|
I let Claude search for optimizations overnight, and it proposed a different representation that has much better cache locality - |
ef58a62 to
a34983a
Compare
yaakov-stein
left a comment
There was a problem hiding this comment.
Claude has a few valid nits/suggestions and I have one last comment on the tests. Overall LGTM once those points are taken care of!
I'm not necessarily opposed to the change as it simplifies some parts of the code, but I'm confused by the claim that this has better cache locality - shouldn't the cache locality here be much worse? Whenever we need to check for equality we need to load a non-contiguous piece of memory. We also can't take advantage of the spatial locality anymore. Can you explain to me what cases you saw ~40% speedup on? |
This was surprising as well to me, so you're right, I should have explained. The benchmark was (a) generate and insert 1 million random IPs, (b) use bf_hashset_foreach to read it back. In both versions (I'll call them In Meanwhile in With |
|
Massive thanks for your continued reviews @yaakov-stein . I'm uploading the latest version with small changes. If they look good, I'll reopen this PR, copying the few comments that are still relevant. It was an experiment to me how much I can rely on the PR review bot, turns out it's pretty good, but GH UX starts degrading massively after 50+ comments. |
a34983a to
7998ded
Compare
Hm, that makes sense. I think the right approach here would be to call out this behavior and mention the tradeoffs in a PR note for Quentin. To me, it seems surprising that we'd optimize a non-classic hashset operation (foreach) at the expense of the classic hashset operations (contains, add, delete). I'd personally expect that a hashset would be optimized for the standard hashset operations, but I understand where you're coming from. If you split this out to a new PR, maybe let's include a note on how and why this decision was made. |
7998ded to
d19f0fc
Compare
| if (needed <= set->cap) | ||
| return 0; | ||
|
|
||
| new_cap = _bf_round_next_power_of_2(set->cap ? set->cap : _BF_HASHSET_INIT_CAP); |
There was a problem hiding this comment.
Claude: must-fix: bf_hashset_reserve computes needed from count but then derives new_cap solely from set->cap (or _BF_HASHSET_INIT_CAP), ignoring needed entirely. For example, bf_hashset_reserve(&set, 100) on an empty set computes needed = 200 but sets new_cap = _bf_round_next_power_of_2(16) = 16, resizing to only 16 slots. The unit test assert_true(bf_hashset_cap(&set) >= 200) would catch this at runtime.
new_cap should be derived from needed, e.g.:
new_cap = _bf_round_next_power_of_2(needed);| value |= value >> 2; | ||
| value |= value >> 4; | ||
| value |= value >> 8; | ||
| value |= value >> 16; |
There was a problem hiding this comment.
Claude: must-fix: _bf_round_next_power_of_2 only shifts up to >> 16, which covers 32-bit values but not 64-bit size_t. For any input where the highest set bit is at position 32 or above, the function returns a non-power-of-two result. This matters because _BF_HASHSET_MAX_CAP is SIZE_MAX / 16 + 1 (well above 2^32 on 64-bit), and once the bf_hashset_reserve bug above is fixed, needed values above 2^32 would produce corrupt table capacities.
Add value |= value >> 32; after this line. The same pre-existing issue exists in cgen/program.c:66.
| * size_t. */ | ||
| #define _BF_HASHSET_MAX_CAP (SIZE_MAX / 16 + 1) | ||
|
|
||
| static inline size_t _bf_round_next_power_of_2(size_t value) |
There was a problem hiding this comment.
Claude: suggestion: _bf_round_next_power_of_2 is defined identically in both hashset.c and cgen/program.c:66 (and both are missing the >> 32 shift). Consider extracting it into helper.h as a shared static inline function so the fix is applied in one place and the copies don't diverge.
Currently,
bf_setusesbf_listunder the hood. This is suboptimal performance-wise. Here I am changingbf_setto be backed bybf_hashset.lib: helper: add FNV-1a hash function- imported from another PR, a simple hash function.lib: add bf_hashset- main commit. Implements a simple hashset (tombstones on deletion, grows 2x when load factor exceeds 50%, never shrinks, linear probing on collision). See https://en.wikipedia.org/wiki/Linear_probing (with "special flag value" strategy in "Deletion" section).lib: set: use bf_hashset for elems field- changesbf_set's inner structureSee #460 for previous attempt.
Fixes #418