Skip to content

PoC: Reduce large struct allocations to <= 13/17/21 KiB for ML-DSA-44/65/87#1005

Draft
mkannwischer wants to merge 22 commits intomainfrom
lowram
Draft

PoC: Reduce large struct allocations to <= 13/17/21 KiB for ML-DSA-44/65/87#1005
mkannwischer wants to merge 22 commits intomainfrom
lowram

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer commented Mar 27, 2026

This is a proof of concept how we could get the stack consumption of {keygen,sign,verify} to <= 13/17/21 KiB of allocations via MLD_ALLOC in REDUCE_RAM-mode. Additionally, as before we need a little bit of stack memory - I have measures 2.5 KiB on my machine, so the overall memory consumption should be comfortably below 16/20/24 KiB.

Warning This is a quick-and-dirty PoC and I do not recommend relying on it. The CBMC proofs are work in progress. This PR will definitely not be merged in one piece, but instead in smaller steps.

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Mar 27, 2026

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof Status Current Previous Change
mld_attempt_signature_generation - 240s -
mld_compute_pack_z - 7s -
mld_compute_t0_t1_tr_from_sk_components - 25s -
pack_pk - 5s -
pack_sig_c_h - 3s -
pack_sig_z - 3s -
pack_sk - 3s -
polyveck_add - 9s -
polyveck_make_hint - 8s -
polyveck_pack_t0 - 4s -
polyveck_pointwise_poly_montgomery_s2 - - -
polyveck_pointwise_poly_montgomery_t0 - - -
polyveck_power2round - 10s -
sign_keypair_internal - 5s -
sign_pk_from_sk - 8s -
sign_verify_internal - 334s -
unpack_hints - 5s -
unpack_pk - 4s -
unpack_sig - 3s -
unpack_sk - 5s -
polyveck_pointwise_poly_montgomery ⚠️ 22s 6s +267%
Full Results (179 proofs)
Proof Status Current Previous Change
**TOTAL** 1796s 2703s -33.6%
polyvec_matrix_expand 193s 175s +10%
poly_pointwise_montgomery_c 151s 158s -4%
rej_uniform_native 141s 146s -3%
mld_invntt_layer 92s 95s -3%
polyvec_matrix_expand_serial 85s 80s +6%
polyvecl_pointwise_acc_montgomery_c 85s 281s -70%
mld_ct_memcmp 73s 74s -1%
polymat_permute_bitrev_to_custom 60s 45s +33%
mld_ntt_layer 55s 56s -2%
keccak_squeezeblocks_x4 41s 42s -2%
sign_signature_internal 41s 55s -25%
polyveck_pointwise_poly_montgomery ⚠️ 22s 6s +267%
rej_uniform 22s 22s +0%
fqmul 20s 19s +5%
poly_chknorm_c 20s 19s +5%
poly_uniform_eta_4x 19s 17s +12%
poly_uniform_4x 17s 17s +0%
mld_polyvecl_permute_bitrev_to_custom_native 16s 14s +14%
polyeta_unpack 15s 17s -12%
polyvec_matrix_pointwise_montgomery 15s 12s +25%
keccakf1600x4_permute_native 14s 13s +8%
mld_check_pct 14s 9s +56%
rej_uniform_c 14s 14s +0%
polyt0_unpack 13s 15s -13%
mld_ntt_butterfly_block 12s 13s -8%
polyveck_decompose 12s 60s -80%
keccak_absorb_once_x4 11s 11s +0%
poly_add 11s 12s -8%
polyveck_shiftl 10s 7s +43%
polyveck_use_hint 10s 13s -23%
keccak_absorb 9s 8s +12%
keccakf1600_permute_native 9s 8s +12%
polyveck_invntt_tomont 9s 8s +12%
polyveck_reduce 9s 9s +0%
mld_sample_s1_s2_serial 8s 9s -11%
poly_decompose_c 8s 7s +14%
polyveck_caddq 8s 8s +0%
polyvecl_ntt 8s 11s -27%
keccakf1600_permute 7s 9s -22%
poly_invntt_tomont_c 7s 7s +0%
polyveck_unpack_t0 7s 4s +75%
polyvecl_uniform_gamma1_serial 7s 5s +40%
polyz_unpack_c 7s 11s -36%
sign 7s 8s -12%
poly_caddq_c 6s 7s -14%
poly_ntt_native 6s 4s +50%
poly_uniform_gamma1_4x 6s 4s +50%
polyveck_ntt 6s 8s -25%
polyveck_sub 6s 7s -14%
polyvecl_chknorm 6s 6s +0%
sign_open 6s 7s -14%
keccakf1600x4_extract_bytes 5s 2s +150%
make_hint 5s 3s +67%
mld_sample_s1_s2 5s 7s -29%
poly_caddq_native 5s 4s +25%
poly_challenge 5s 4s +25%
poly_power2round 5s 5s +0%
poly_uniform_eta 5s 5s +0%
polyveck_unpack_eta 5s 3s +67%
reduce32 5s 4s +25%
rej_eta_native 5s 4s +25%
sign_signature 5s 3s +67%
keccak_squeeze 4s 6s -33%
keccakf1600_xor_bytes 4s 1s +300%
keccakf1600_xor_bytes (big endian) 4s 2s +100%
mld_ct_cmask_nonzero_u8 4s 3s +33%
mld_ct_get_optblocker_u8 4s 2s +100%
mld_ct_sel_int32 4s 2s +100%
mld_h 4s 4s +0%
mld_keccakf1600_extract_bytes 4s 4s +0%
mld_value_barrier_u32 4s 2s +100%
ntt_native_x86_64 4s 4s +0%
poly_caddq_native_aarch64 4s 2s +100%
poly_chknorm_native 4s 5s -20%
poly_decompose_native 4s 4s +0%
poly_make_hint 4s 2s +100%
poly_ntt 4s 4s +0%
poly_ntt_c 4s 3s +33%
poly_uniform 4s 4s +0%
poly_uniform_gamma1 4s 4s +0%
polyt1_unpack 4s 2s +100%
polyveck_chknorm 4s 5s -20%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyvecl_unpack_z 4s 1s +300%
polyw1_pack 4s 4s +0%
polyz_unpack_native 4s 4s +0%
shake128_init 4s 3s +33%
shake256_finalize 4s 3s +33%
sign_signature_pre_hash_internal 4s 2s +100%
sign_verify_pre_hash_internal 4s 4s +0%
sign_verify_pre_hash_shake256 4s 4s +0%
sys_check_capability 4s 2s +100%
caddq 3s 1s +200%
keccak_finalize 3s 4s -25%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600x4_permute 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u32 3s 4s -25%
ntt_native_aarch64 3s 6s -50%
poly_caddq 3s 3s +0%
poly_chknorm_native_aarch64 3s 2s +50%
poly_decompose 3s 4s -25%
poly_reduce 3s 4s -25%
poly_use_hint_c 3s 3s +0%
poly_use_hint_native 3s 4s -25%
polyeta_pack 3s 3s +0%
polyt0_pack 3s 5s -40%
polyt1_pack 3s 3s +0%
polyveck_pack_eta 3s 3s +0%
polyvecl_pack_eta 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 2s +50%
polyvecl_uniform_gamma1 3s 3s +0%
polyz_pack 3s 2s +50%
polyz_unpack 3s 3s +0%
rej_eta_c 3s 5s -40%
shake128_absorb 3s 3s +0%
shake256_absorb 3s 2s +50%
shake256_release 3s 5s -40%
shake256_squeeze 3s 3s +0%
sign_signature_pre_hash_shake256 3s 3s +0%
sign_verify 3s 5s -40%
decompose 2s 4s -50%
fqscale 2s 4s -50%
intt_native_x86_64 2s 3s -33%
keccak_init 2s 4s -50%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 3s -33%
mld_prepare_domain_separation_prefix 2s 7s -71%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u8 2s 4s -50%
montgomery_reduce 2s 3s -33%
poly_chknorm 2s 1s +100%
poly_invntt_tomont 2s 3s -33%
poly_invntt_tomont_native 2s 3s -33%
poly_pointwise_montgomery 2s 3s -33%
poly_pointwise_montgomery_native 2s 3s -33%
poly_shiftl 2s 3s -33%
poly_sub 2s 3s -33%
poly_use_hint 2s 2s +0%
polyveck_pack_w1 2s 4s -50%
polyvecl_unpack_eta 2s 4s -50%
power2round 2s 3s -33%
rej_eta 2s 4s -50%
shake128_finalize 2s 2s +0%
shake128_release 2s 4s -50%
shake128_squeeze 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
shake128x4_squeezeblocks 2s 2s +0%
shake256 2s 3s -33%
shake256_init 2s 1s +100%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
sign_keypair 2s 2s +0%
sign_signature_extmu 2s 5s -60%
sign_verify_extmu 2s 3s -33%
use_hint 2s 3s -33%
mld_attempt_signature_generation - 240s -
mld_compute_pack_z - 7s -
mld_compute_t0_t1_tr_from_sk_components - 25s -
pack_pk - 5s -
pack_sig_c_h - 3s -
pack_sig_z - 3s -
pack_sk - 3s -
polyveck_add - 9s -
polyveck_make_hint - 8s -
polyveck_pack_t0 - 4s -
polyveck_pointwise_poly_montgomery_s2 - - -
polyveck_pointwise_poly_montgomery_t0 - - -
polyveck_power2round - 10s -
sign_keypair_internal - 5s -
sign_pk_from_sk - 8s -
sign_verify_internal - 334s -
unpack_hints - 5s -
unpack_pk - 4s -
unpack_sig - 3s -
unpack_sk - 5s -

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Mar 27, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
mld_attempt_signature_generation - 234s -
mld_compute_pack_z - 6s -
mld_compute_t0_t1_tr_from_sk_components - 13s -
pack_pk - 7s -
pack_sig_c_h - 2s -
pack_sig_z - 3s -
pack_sk - 2s -
polyveck_add - 5s -
polyveck_make_hint - 2s -
polyveck_pack_t0 - 4s -
polyveck_pointwise_poly_montgomery_s2 - - -
polyveck_pointwise_poly_montgomery_t0 - - -
polyveck_power2round - 14s -
sign_keypair_internal - 5s -
sign_pk_from_sk - 7s -
sign_verify_internal - 127s -
unpack_hints - 6s -
unpack_pk - 2s -
unpack_sig - 2s -
unpack_sk - 5s -
polyveck_invntt_tomont ⚠️ 20s 3s +567%
Full Results (179 proofs)
Proof Status Current Previous Change
**TOTAL** 1462s 2066s -29.2%
polyvecl_pointwise_acc_montgomery_c 189s 233s -19%
poly_pointwise_montgomery_c 144s 162s -11%
rej_uniform_native 136s 145s -6%
mld_invntt_layer 85s 86s -1%
mld_ct_memcmp 69s 82s -16%
mld_ntt_layer 53s 57s -7%
keccak_squeezeblocks_x4 41s 42s -2%
fqmul 20s 21s -5%
polyveck_invntt_tomont ⚠️ 20s 3s +567%
rej_uniform 20s 22s -9%
poly_chknorm_c 19s 23s -17%
polyvec_matrix_expand 18s 28s -36%
polyeta_unpack 17s 18s -6%
sign_signature_internal 17s 33s -48%
polyt0_unpack 16s 17s -6%
rej_uniform_c 15s 18s -17%
poly_uniform_eta_4x 14s 18s -22%
polyz_unpack_c 14s 11s +27%
mld_ntt_butterfly_block 13s 14s -7%
poly_uniform_4x 13s 14s -7%
keccakf1600x4_permute_native 12s 12s +0%
poly_add 12s 13s -8%
polyvec_matrix_pointwise_montgomery 12s 15s -20%
keccak_absorb_once_x4 9s 10s -10%
keccakf1600_permute 8s 7s +14%
keccakf1600_permute_native 8s 7s +14%
mld_check_pct 8s 8s +0%
poly_invntt_tomont_c 8s 7s +14%
keccak_absorb 7s 9s -22%
mld_h 7s 6s +17%
polymat_permute_bitrev_to_custom 7s 16s -56%
polyvec_matrix_expand_serial 7s 11s -36%
polyveck_sub 7s 4s +75%
sign_open 7s 5s +40%
sign_signature_pre_hash_internal 7s 4s +75%
mld_polyvecl_permute_bitrev_to_custom_native 6s 10s -40%
poly_decompose_native 6s 3s +100%
poly_uniform 6s 5s +20%
polyveck_chknorm 6s 7s -14%
polyveck_decompose 6s 7s -14%
polyveck_reduce 6s 5s +20%
polyvecl_ntt 6s 3s +100%
polyz_unpack 6s 6s +0%
sign 6s 7s -14%
mld_prepare_domain_separation_prefix 5s 6s -17%
poly_reduce 5s 4s +25%
polyveck_caddq 5s 7s -29%
polyveck_ntt 5s 4s +25%
polyveck_use_hint 5s 5s +0%
polyvecl_chknorm 5s 3s +67%
decompose 4s 4s +0%
fqscale 4s 3s +33%
make_hint 4s 2s +100%
mld_ct_cmask_nonzero_u32 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 3s +33%
mld_ct_get_optblocker_u8 4s 2s +100%
mld_sample_s1_s2_serial 4s 5s -20%
montgomery_reduce 4s 3s +33%
ntt_native_aarch64 4s 3s +33%
poly_caddq_c 4s 6s -33%
poly_caddq_native 4s 4s +0%
poly_caddq_native_aarch64 4s 3s +33%
poly_challenge 4s 5s -20%
poly_decompose_c 4s 3s +33%
poly_ntt 4s 4s +0%
poly_ntt_native 4s 3s +33%
poly_power2round 4s 6s -33%
poly_shiftl 4s 4s +0%
poly_sub 4s 4s +0%
poly_use_hint_c 4s 5s -20%
polyeta_pack 4s 4s +0%
polyt0_pack 4s 4s +0%
polyveck_shiftl 4s 6s -33%
polyveck_unpack_eta 4s 3s +33%
polyveck_unpack_t0 4s 5s -20%
polyvecl_permute_bitrev_to_custom 4s 3s +33%
polyvecl_unpack_z 4s 3s +33%
rej_eta_c 4s 4s +0%
shake128_release 4s 4s +0%
shake128_squeeze 4s 3s +33%
shake128x4_absorb_once 4s 2s +100%
sign_keypair 4s 3s +33%
sign_signature_pre_hash_shake256 4s 2s +100%
sign_verify 4s 7s -43%
intt_native_x86_64 3s 5s -40%
keccak_init 3s 2s +50%
keccak_squeeze 3s 4s -25%
keccakf1600_xor_bytes 3s 4s -25%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_xor_bytes 3s 3s +0%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_sample_s1_s2 3s 3s +0%
mld_value_barrier_i64 3s 2s +50%
poly_chknorm_native 3s 3s +0%
poly_decompose 3s 3s +0%
poly_invntt_tomont_native 3s 4s -25%
poly_pointwise_montgomery_native 3s 4s -25%
poly_uniform_eta 3s 5s -40%
poly_uniform_gamma1 3s 3s +0%
poly_uniform_gamma1_4x 3s 4s -25%
poly_use_hint 3s 2s +50%
poly_use_hint_native 3s 2s +50%
polyt1_pack 3s 2s +50%
polyt1_unpack 3s 5s -40%
polyveck_pack_eta 3s 3s +0%
polyveck_pack_w1 3s 3s +0%
polyveck_pointwise_poly_montgomery 3s 4s -25%
polyvecl_pack_eta 3s 6s -50%
polyvecl_uniform_gamma1_serial 3s 4s -25%
polyw1_pack 3s 3s +0%
polyz_pack 3s 3s +0%
reduce32 3s 3s +0%
rej_eta_native 3s 4s -25%
shake128_absorb 3s 2s +50%
shake128_init 3s 2s +50%
shake256 3s 2s +50%
shake256_squeeze 3s 2s +50%
shake256x4_absorb_once 3s 2s +50%
shake256x4_squeezeblocks 3s 1s +200%
sign_signature 3s 2s +50%
sign_signature_extmu 3s 3s +0%
sign_verify_extmu 3s 5s -40%
sign_verify_pre_hash_internal 3s 3s +0%
sign_verify_pre_hash_shake256 3s 3s +0%
use_hint 3s 4s -25%
keccak_finalize 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 4s -50%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_permute 2s 5s -60%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 5s -60%
ntt_native_x86_64 2s 2s +0%
poly_caddq 2s 3s -33%
poly_invntt_tomont 2s 4s -50%
poly_ntt_c 2s 3s -33%
poly_pointwise_montgomery 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyvecl_pointwise_acc_montgomery_native 2s 7s -71%
polyvecl_uniform_gamma1 2s 3s -33%
polyvecl_unpack_eta 2s 3s -33%
polyz_unpack_native 2s 3s -33%
power2round 2s 2s +0%
rej_eta 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128x4_squeezeblocks 2s 2s +0%
shake256_finalize 2s 3s -33%
shake256_release 2s 2s +0%
sys_check_capability 2s 3s -33%
mld_attempt_signature_generation - 234s -
mld_compute_pack_z - 6s -
mld_compute_t0_t1_tr_from_sk_components - 13s -
pack_pk - 7s -
pack_sig_c_h - 2s -
pack_sig_z - 3s -
pack_sk - 2s -
polyveck_add - 5s -
polyveck_make_hint - 2s -
polyveck_pack_t0 - 4s -
polyveck_pointwise_poly_montgomery_s2 - - -
polyveck_pointwise_poly_montgomery_t0 - - -
polyveck_power2round - 14s -
sign_keypair_internal - 5s -
sign_pk_from_sk - 7s -
sign_verify_internal - 127s -
unpack_hints - 6s -
unpack_pk - 2s -
unpack_sig - 2s -
unpack_sk - 5s -
caddq 1s 5s -80%
mld_ct_get_optblocker_i64 1s 3s -67%
mld_ct_get_optblocker_u32 1s 2s -50%
mld_ct_sel_int32 1s 2s -50%
poly_chknorm 1s 2s -50%
poly_chknorm_native_aarch64 1s 3s -67%
poly_make_hint 1s 2s -50%
shake256_absorb 1s 2s -50%
shake256_init 1s 2s -50%

Introduce mld_s1vec, following the same pattern as mld_polymat for
reduced RAM usage. In normal mode, it stores the full NTT'd polyvecl.
In REDUCE_RAM mode, it stores a pointer to the packed s1 data in the
secret key and unpacks + NTTs individual polynomials on demand.

This reduces signing memory in REDUCE_RAM mode:
- ML-DSA-44: 32,448 -> 28,384 (-4,064 bytes)
- ML-DSA-65: 44,768 -> 39,680 (-5,088 bytes)
- ML-DSA-87: 59,104 -> 51,968 (-7,136 bytes)

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Same pattern as mld_s1vec: in normal mode stores the full NTT'd
polyveck, in REDUCE_RAM mode stores a pointer and unpacks + NTTs
on demand.

REDUCE_RAM signing memory reduction:
- ML-DSA-44: 28,384 -> 24,320 (-4,064 bytes)
- ML-DSA-65: 39,680 -> 33,568 (-6,112 bytes)
- ML-DSA-87: 51,968 -> 43,808 (-8,160 bytes)

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Same pattern as mld_s1vec and mld_s2vec: in normal mode stores the
full NTT'd polyveck, in REDUCE_RAM mode stores a pointer and unpacks
+ NTTs on demand.

REDUCE_RAM signing memory reduction:
- ML-DSA-44: 24,320 -> 20,256 (-4,064 bytes)
- ML-DSA-65: 33,568 -> 27,456 (-6,112 bytes)
- ML-DSA-87: 43,808 -> 35,648 (-8,160 bytes)

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Instead of allocating a full polyveck for h in attempt_signature_generation,
compute cs2, ct0, and hints one polynomial at a time using scratch polys.

This eliminates the polyveck h from the yh_u union, replacing
mld_pack_sig_c_h with incremental packing via mld_pack_sig_c,
mld_pack_sig_h_init, and mld_pack_sig_h_poly.

Sign allocation savings (normal / REDUCE_RAM):
- ML-DSA-44: -4096 / 0 bytes
- ML-DSA-65: -6144 / -1024 bytes
- ML-DSA-87: -8192 / -1024 bytes

Note: CBMC proofs are not updated yet.
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
In REDUCE_RAM mode, shrink mld_polymat from rho + row_buffer (L polys)
to rho + poly_buffer (1 poly). Replace mld_polymat_get_row with
mld_polymat_get_element that samples a single A[k][l] on demand.

Rewrite mld_polyvec_matrix_pointwise_montgomery in REDUCE_RAM mode
to use per-element access, accumulating A[k][l] * v[l] one element
at a time.

Normal mode is unchanged (full matrix, row-based access).

REDUCE_RAM allocation savings:
- ML-DSA-44: keypair -3072, sign -3072, verify -3072, pk_from_sk -3072
- ML-DSA-65: keypair -4096, sign -4096, verify -4096, pk_from_sk -4096
- ML-DSA-87: keypair -6144, sign -6144, verify -6144, pk_from_sk -6144

Note: CBMC proofs are not updated yet.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Introduce mld_yvec type following the same pattern as mld_s1vec/s2vec/t0vec:
in normal mode it holds the full polyvecl, in REDUCE_RAM mode it stores
only the seed (rhoprime) and nonce for on-demand regeneration.

Add mld_polyvec_matrix_pointwise_montgomery_yvec which computes
w = invNTT(A * NTT(y)). In REDUCE_RAM mode it fuses y sampling with
column-by-column matrix multiplication, avoiding storage of y entirely.
In normal mode it delegates to the existing bulk path.

Also enable mld_poly_uniform_gamma1 for REDUCE_RAM builds so the
per-poly y regeneration works for all parameter sets.

REDUCE_RAM sign allocation savings:
- ML-DSA-44: 17184 -> 13120 (-4064 bytes)
- ML-DSA-65: 22336 -> 17248 (-5088 bytes)
- ML-DSA-87: 28480 -> 21344 (-7136 bytes)

Normal mode is unchanged.

Note: CBMC proofs are not updated yet.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Replace mld_compute_t0_t1_tr_from_sk_components with per-row
mld_compute_t0k_t1k. Both keygen and pk_from_sk now process one
row at a time, packing t1[k] into pk and t0[k] into sk immediately.

This eliminates full polyveck allocations for t0, t1, and the
matrix from both code paths.

Note: CBMC proofs are not updated yet.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
To silence linting errors.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Mar 27, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
mld_attempt_signature_generation - 278s -
mld_compute_pack_z - 8s -
mld_compute_t0_t1_tr_from_sk_components - 27s -
pack_pk - 3s -
pack_sig_c_h - 2s -
pack_sig_z - 3s -
pack_sk - 5s -
polyveck_add - 7s -
polyveck_make_hint - 6s -
polyveck_pack_t0 - 4s -
polyveck_pointwise_poly_montgomery_s2 - - -
polyveck_pointwise_poly_montgomery_t0 - - -
polyveck_power2round - 11s -
polyvecl_pointwise_acc_montgomery_c - 190s -
sign_keypair_internal - 6s -
sign_pk_from_sk - 7s -
sign_verify_internal - 341s -
unpack_hints - 6s -
unpack_pk - 3s -
unpack_sig - 2s -
unpack_sk - 6s -
polyvecl_chknorm ⚠️ 30s 13s +131%
Full Results (179 proofs)
Proof Status Current Previous Change
**TOTAL** 1604s 2488s -35.5%
poly_pointwise_montgomery_c 170s 160s +6%
rej_uniform_native 151s 145s +4%
polyvec_matrix_expand 125s 128s -2%
mld_invntt_layer 100s 96s +4%
mld_ct_memcmp 83s 77s +8%
mld_ntt_layer 61s 54s +13%
polyvec_matrix_expand_serial 56s 70s -20%
keccak_squeezeblocks_x4 46s 42s +10%
polyvecl_chknorm ⚠️ 30s 13s +131%
rej_uniform 25s 23s +9%
sign_signature_internal 24s 37s -35%
poly_chknorm_c 22s 22s +0%
fqmul 21s 18s +17%
poly_uniform_4x 18s 15s +20%
polymat_permute_bitrev_to_custom 18s 29s -38%
poly_uniform_eta_4x 16s 16s +0%
polyt0_unpack 16s 14s +14%
rej_uniform_c 16s 17s -6%
polyvec_matrix_pointwise_montgomery 14s 11s +27%
mld_ntt_butterfly_block 13s 13s +0%
mld_polyvecl_permute_bitrev_to_custom_native 13s 8s +62%
keccakf1600_permute_native 12s 9s +33%
keccakf1600x4_permute_native 12s 13s -8%
polyveck_shiftl 12s 8s +50%
poly_add 11s 11s +0%
keccak_absorb_once_x4 10s 9s +11%
poly_decompose_c 10s 7s +43%
polyveck_decompose 10s 14s -29%
keccakf1600_permute 9s 10s -10%
mld_check_pct 9s 8s +12%
polyveck_invntt_tomont 9s 10s -10%
polyveck_use_hint 9s 8s +12%
keccak_absorb 8s 7s +14%
poly_invntt_tomont_c 8s 10s -20%
polyveck_sub 8s 10s -20%
polyvecl_ntt 8s 8s +0%
sign 8s 8s +0%
sign_signature_pre_hash_internal 8s 4s +100%
polyt1_unpack 7s 4s +75%
polyveck_caddq 7s 7s +0%
polyveck_chknorm 7s 5s +40%
polyveck_ntt 7s 11s -36%
polyveck_reduce 7s 6s +17%
mld_prepare_domain_separation_prefix 6s 7s -14%
mld_sample_s1_s2 6s 6s +0%
poly_challenge 6s 5s +20%
polyveck_pointwise_poly_montgomery 6s 6s +0%
intt_native_x86_64 5s 3s +67%
mld_sample_s1_s2_serial 5s 6s -17%
poly_caddq_c 5s 4s +25%
poly_shiftl 5s 3s +67%
poly_uniform_gamma1_4x 5s 3s +67%
poly_use_hint_c 5s 3s +67%
polyt0_pack 5s 5s +0%
polyvecl_pack_eta 5s 2s +150%
polyvecl_pointwise_acc_montgomery_native 5s 3s +67%
polyvecl_uniform_gamma1 5s 2s +150%
polyvecl_uniform_gamma1_serial 5s 5s +0%
polyvecl_unpack_z 5s 3s +67%
rej_eta 5s 2s +150%
rej_eta_c 5s 4s +25%
rej_eta_native 5s 4s +25%
sign_signature 5s 6s -17%
sign_verify_pre_hash_internal 5s 6s -17%
use_hint 5s 4s +25%
keccakf1600_xor_bytes 4s 1s +300%
keccakf1600_xor_bytes (big endian) 4s 2s +100%
mld_h 4s 5s -20%
mld_keccakf1600_extract_bytes 4s 2s +100%
poly_caddq_native 4s 3s +33%
poly_chknorm_native 4s 3s +33%
poly_ntt 4s 4s +0%
poly_pointwise_montgomery_native 4s 4s +0%
poly_power2round 4s 4s +0%
poly_sub 4s 4s +0%
poly_uniform_eta 4s 6s -33%
poly_use_hint_native 4s 3s +33%
polyeta_unpack 4s 6s -33%
polyveck_pack_eta 4s 4s +0%
polyveck_pack_w1 4s 4s +0%
polyveck_unpack_t0 4s 3s +33%
shake128_init 4s 1s +300%
shake256_finalize 4s 2s +100%
sign_keypair 4s 2s +100%
sign_verify 4s 4s +0%
sign_verify_extmu 4s 3s +33%
fqscale 3s 3s +0%
keccak_finalize 3s 3s +0%
keccak_init 3s 1s +200%
keccak_squeeze 3s 4s -25%
make_hint 3s 2s +50%
mld_ct_cmask_neg_i32 3s 1s +200%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_ct_sel_int32 3s 5s -40%
montgomery_reduce 3s 3s +0%
ntt_native_x86_64 3s 4s -25%
poly_caddq_native_aarch64 3s 4s -25%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose_native 3s 5s -40%
poly_ntt_c 3s 3s +0%
poly_reduce 3s 2s +50%
poly_uniform 3s 2s +50%
poly_use_hint 3s 3s +0%
polyeta_pack 3s 2s +50%
polyt1_pack 3s 3s +0%
polyveck_unpack_eta 3s 5s -40%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyw1_pack 3s 4s -25%
polyz_pack 3s 3s +0%
polyz_unpack_c 3s 2s +50%
power2round 3s 1s +200%
shake128_absorb 3s 2s +50%
shake128_finalize 3s 3s +0%
shake128x4_squeezeblocks 3s 2s +50%
shake256 3s 2s +50%
shake256_release 3s 3s +0%
shake256x4_absorb_once 3s 4s -25%
sign_open 3s 5s -40%
sign_signature_extmu 3s 4s -25%
sign_signature_pre_hash_shake256 3s 5s -40%
sys_check_capability 3s 1s +200%
caddq 2s 3s -33%
decompose 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes 2s 3s -33%
keccakf1600x4_permute 2s 4s -50%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 3s -33%
ntt_native_aarch64 2s 3s -33%
poly_caddq 2s 3s -33%
poly_chknorm 2s 3s -33%
poly_decompose 2s 3s -33%
poly_invntt_tomont 2s 2s +0%
poly_invntt_tomont_native 2s 4s -50%
poly_make_hint 2s 3s -33%
poly_ntt_native 2s 2s +0%
poly_pointwise_montgomery 2s 4s -50%
poly_uniform_gamma1 2s 4s -50%
polyvecl_permute_bitrev_to_custom 2s 3s -33%
polyvecl_unpack_eta 2s 4s -50%
polyz_unpack 2s 2s +0%
polyz_unpack_native 2s 2s +0%
reduce32 2s 2s +0%
shake128x4_absorb_once 2s 3s -33%
shake256_squeeze 2s 4s -50%
shake256x4_squeezeblocks 2s 2s +0%
sign_verify_pre_hash_shake256 2s 5s -60%
mld_attempt_signature_generation - 278s -
mld_compute_pack_z - 8s -
mld_compute_t0_t1_tr_from_sk_components - 27s -
pack_pk - 3s -
pack_sig_c_h - 2s -
pack_sig_z - 3s -
pack_sk - 5s -
polyveck_add - 7s -
polyveck_make_hint - 6s -
polyveck_pack_t0 - 4s -
polyveck_pointwise_poly_montgomery_s2 - - -
polyveck_pointwise_poly_montgomery_t0 - - -
polyveck_power2round - 11s -
polyvecl_pointwise_acc_montgomery_c - 190s -
sign_keypair_internal - 6s -
sign_pk_from_sk - 7s -
sign_verify_internal - 341s -
unpack_hints - 6s -
unpack_pk - 3s -
unpack_sig - 2s -
unpack_sk - 6s -
mld_ct_get_optblocker_i64 1s 2s -50%
mld_ct_get_optblocker_u32 1s 1s +0%
mld_value_barrier_u8 1s 3s -67%
shake128_release 1s 1s +0%
shake128_squeeze 1s 4s -75%
shake256_absorb 1s 6s -83%
shake256_init 1s 3s -67%

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer force-pushed the lowram branch 3 times, most recently from c06caac to b56df8e Compare March 28, 2026 03:26
Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: b56df8e Previous: bb07ee8 Ratio
ML-DSA-44 keypair 34222 cycles 34508 cycles 0.99
ML-DSA-44 sign 120003 cycles 119762 cycles 1.00
ML-DSA-44 verify 38274 cycles 38106 cycles 1.00
ML-DSA-65 keypair 58946 cycles 61327 cycles 0.96
ML-DSA-65 sign 198396 cycles 202109 cycles 0.98
ML-DSA-65 verify 63064 cycles 62771 cycles 1.00
ML-DSA-87 keypair 92076 cycles 94593 cycles 0.97
ML-DSA-87 sign 242172 cycles 240827 cycles 1.01
ML-DSA-87 verify 96085 cycles 96019 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: b56df8e Previous: bb07ee8 Ratio
ML-DSA-44 keypair 95415 cycles 93753 cycles 1.02
ML-DSA-44 sign 331798 cycles 333304 cycles 1.00
ML-DSA-44 verify 99519 cycles 99738 cycles 1.00
ML-DSA-65 keypair 161336 cycles 159678 cycles 1.01
ML-DSA-65 sign 539375 cycles 544024 cycles 0.99
ML-DSA-65 verify 162895 cycles 160787 cycles 1.01
ML-DSA-87 keypair 268110 cycles 267177 cycles 1.00
ML-DSA-87 sign 705391 cycles 705890 cycles 1.00
ML-DSA-87 verify 268591 cycles 270246 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 112362 cycles 113139 cycles 0.99
ML-DSA-44 sign 356672 cycles 355421 cycles 1.00
ML-DSA-44 verify 117719 cycles 117817 cycles 1.00
ML-DSA-65 keypair 194918 cycles 196421 cycles 0.99
ML-DSA-65 sign 586360 cycles 588818 cycles 1.00
ML-DSA-65 verify 194819 cycles 194511 cycles 1.00
ML-DSA-87 keypair 319997 cycles 322254 cycles 0.99
ML-DSA-87 sign 751619 cycles 752975 cycles 1.00
ML-DSA-87 verify 319902 cycles 320113 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 68388 cycles 68974 cycles 0.99
ML-DSA-44 sign 189267 cycles 187318 cycles 1.01
ML-DSA-44 verify 69139 cycles 69050 cycles 1.00
ML-DSA-65 keypair 118689 cycles 119428 cycles 0.99
ML-DSA-65 sign 301252 cycles 300617 cycles 1.00
ML-DSA-65 verify 115747 cycles 115643 cycles 1.00
ML-DSA-87 keypair 201697 cycles 203571 cycles 0.99
ML-DSA-87 sign 393413 cycles 394649 cycles 1.00
ML-DSA-87 verify 194483 cycles 195659 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 55881 cycles 56817 cycles 0.98
ML-DSA-44 sign 180718 cycles 182410 cycles 0.99
ML-DSA-44 verify 61359 cycles 61615 cycles 1.00
ML-DSA-65 keypair 97413 cycles 98729 cycles 0.99
ML-DSA-65 sign 296886 cycles 298290 cycles 1.00
ML-DSA-65 verify 101441 cycles 100286 cycles 1.01
ML-DSA-87 keypair 150700 cycles 152586 cycles 0.99
ML-DSA-87 sign 354653 cycles 355720 cycles 1.00
ML-DSA-87 verify 154075 cycles 153499 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 134214 cycles 134983 cycles 0.99
ML-DSA-44 sign 524720 cycles 524482 cycles 1.00
ML-DSA-44 verify 147384 cycles 147385 cycles 1.00
ML-DSA-65 keypair 226870 cycles 228309 cycles 0.99
ML-DSA-65 sign 854441 cycles 864340 cycles 0.99
ML-DSA-65 verify 236415 cycles 236413 cycles 1.00
ML-DSA-87 keypair 368665 cycles 370688 cycles 0.99
ML-DSA-87 sign 1068488 cycles 1079564 cycles 0.99
ML-DSA-87 verify 382091 cycles 383220 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 39871 cycles 42279 cycles 0.94
ML-DSA-44 sign 136504 cycles 132300 cycles 1.03
ML-DSA-44 verify 44253 cycles 43971 cycles 1.01
ML-DSA-65 keypair 71924 cycles 76769 cycles 0.94
ML-DSA-65 sign 213770 cycles 217452 cycles 0.98
ML-DSA-65 verify 72509 cycles 73895 cycles 0.98
ML-DSA-87 keypair 108439 cycles 108025 cycles 1.00
ML-DSA-87 sign 251417 cycles 252354 cycles 1.00
ML-DSA-87 verify 109165 cycles 109188 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 sign 136504 cycles 132300 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 211753 cycles 212555 cycles 1.00
ML-DSA-44 sign 758883 cycles 759099 cycles 1.00
ML-DSA-44 verify 229118 cycles 228906 cycles 1.00
ML-DSA-65 keypair 377189 cycles 380502 cycles 0.99
ML-DSA-65 sign 1247155 cycles 1251648 cycles 1.00
ML-DSA-65 verify 371410 cycles 372262 cycles 1.00
ML-DSA-87 keypair 603210 cycles 604945 cycles 1.00
ML-DSA-87 sign 1585138 cycles 1590686 cycles 1.00
ML-DSA-87 verify 618819 cycles 616948 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 156948 cycles 157614 cycles 1.00
ML-DSA-44 sign 548292 cycles 551534 cycles 0.99
ML-DSA-44 verify 169377 cycles 169123 cycles 1.00
ML-DSA-65 keypair 266042 cycles 267907 cycles 0.99
ML-DSA-65 sign 891894 cycles 904333 cycles 0.99
ML-DSA-65 verify 274396 cycles 275011 cycles 1.00
ML-DSA-87 keypair 447024 cycles 448619 cycles 1.00
ML-DSA-87 sign 1153646 cycles 1157905 cycles 1.00
ML-DSA-87 verify 459676 cycles 458683 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 67619 cycles 68090 cycles 0.99
ML-DSA-44 sign 202698 cycles 202380 cycles 1.00
ML-DSA-44 verify 70891 cycles 70623 cycles 1.00
ML-DSA-65 keypair 119598 cycles 121010 cycles 0.99
ML-DSA-65 sign 330515 cycles 332267 cycles 0.99
ML-DSA-65 verify 117848 cycles 117974 cycles 1.00
ML-DSA-87 keypair 196903 cycles 198259 cycles 0.99
ML-DSA-87 sign 427461 cycles 428218 cycles 1.00
ML-DSA-87 verify 194811 cycles 194635 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 71568 cycles 72253 cycles 0.99
ML-DSA-44 sign 212744 cycles 212376 cycles 1.00
ML-DSA-44 verify 75553 cycles 75747 cycles 1.00
ML-DSA-65 keypair 126328 cycles 127630 cycles 0.99
ML-DSA-65 sign 349346 cycles 350882 cycles 1.00
ML-DSA-65 verify 125556 cycles 125712 cycles 1.00
ML-DSA-87 keypair 205745 cycles 208495 cycles 0.99
ML-DSA-87 sign 444140 cycles 450030 cycles 0.99
ML-DSA-87 verify 205734 cycles 205745 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 120051 cycles 120340 cycles 1.00
ML-DSA-44 sign 444378 cycles 447581 cycles 0.99
ML-DSA-44 verify 130075 cycles 130373 cycles 1.00
ML-DSA-65 keypair 203529 cycles 204354 cycles 1.00
ML-DSA-65 sign 719394 cycles 728319 cycles 0.99
ML-DSA-65 verify 209932 cycles 209199 cycles 1.00
ML-DSA-87 keypair 338921 cycles 338993 cycles 1.00
ML-DSA-87 sign 918581 cycles 921541 cycles 1.00
ML-DSA-87 verify 346483 cycles 348601 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 128669 cycles 128240 cycles 1.00
ML-DSA-44 sign 445672 cycles 447597 cycles 1.00
ML-DSA-44 verify 136986 cycles 144662 cycles 0.95
ML-DSA-65 keypair 219848 cycles 220500 cycles 1.00
ML-DSA-65 sign 720286 cycles 727093 cycles 0.99
ML-DSA-65 verify 221049 cycles 223077 cycles 0.99
ML-DSA-87 keypair 365316 cycles 365045 cycles 1.00
ML-DSA-87 sign 919622 cycles 925847 cycles 0.99
ML-DSA-87 verify 370439 cycles 372789 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 137571 cycles 138463 cycles 0.99
ML-DSA-44 sign 482669 cycles 483929 cycles 1.00
ML-DSA-44 verify 148479 cycles 162291 cycles 0.91
ML-DSA-65 keypair 240785 cycles 241435 cycles 1.00
ML-DSA-65 sign 784950 cycles 792312 cycles 0.99
ML-DSA-65 verify 240892 cycles 241250 cycles 1.00
ML-DSA-87 keypair 394576 cycles 396566 cycles 0.99
ML-DSA-87 sign 1006235 cycles 1012538 cycles 0.99
ML-DSA-87 verify 403026 cycles 402623 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 112662 cycles 113410 cycles 0.99
ML-DSA-44 sign 356702 cycles 355818 cycles 1.00
ML-DSA-44 verify 118075 cycles 118279 cycles 1.00
ML-DSA-65 keypair 195068 cycles 196486 cycles 0.99
ML-DSA-65 sign 587010 cycles 588672 cycles 1.00
ML-DSA-65 verify 195142 cycles 194830 cycles 1.00
ML-DSA-87 keypair 321107 cycles 323043 cycles 0.99
ML-DSA-87 sign 752936 cycles 753644 cycles 1.00
ML-DSA-87 verify 319982 cycles 320341 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: b56df8e Previous: db65535 Ratio
ML-DSA-44 keypair 213721 cycles 213406 cycles 1.00
ML-DSA-44 sign 759277 cycles 762744 cycles 1.00
ML-DSA-44 verify 229673 cycles 235007 cycles 0.98
ML-DSA-65 keypair 378733 cycles 380391 cycles 1.00
ML-DSA-65 sign 1246651 cycles 1253555 cycles 0.99
ML-DSA-65 verify 372918 cycles 371798 cycles 1.00
ML-DSA-87 keypair 603099 cycles 604988 cycles 1.00
ML-DSA-87 sign 1583611 cycles 1596422 cycles 0.99
ML-DSA-87 verify 618046 cycles 619153 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

… mode

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants