PoC: Reduce large struct allocations to <= 13/17/21 KiB for ML-DSA-44/65/87 by mkannwischer · Pull Request #1005 · pq-code-package/mldsa-native

mkannwischer · 2026-03-27T05:10:51Z

This is a proof of concept how we could get the stack consumption of {keygen,sign,verify} to <= 13/17/21 KiB of allocations via MLD_ALLOC in REDUCE_RAM-mode. Additionally, as before we need a little bit of stack memory - I have measures 2.5 KiB on my machine, so the overall memory consumption should be comfortably below 16/20/24 KiB.

Warning This is a quick-and-dirty PoC and I do not recommend relying on it. The CBMC proofs are work in progress. This PR will definitely not be merged in one piece, but instead in smaller steps.

oqs-bot · 2026-03-27T05:39:33Z

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof	Status	Current	Previous	Change
`mld_attempt_signature_generation`	❌	-	240s	-
`mld_compute_pack_z`	❌	-	7s	-
`mld_compute_t0_t1_tr_from_sk_components`	❌	-	25s	-
`pack_pk`	❌	-	5s	-
`pack_sig_c_h`	❌	-	3s	-
`pack_sig_z`	❌	-	3s	-
`pack_sk`	❌	-	3s	-
`polyveck_add`	❌	-	9s	-
`polyveck_make_hint`	❌	-	8s	-
`polyveck_pack_t0`	❌	-	4s	-
`polyveck_pointwise_poly_montgomery_s2`	❌	-	-	-
`polyveck_pointwise_poly_montgomery_t0`	❌	-	-	-
`polyveck_power2round`	❌	-	10s	-
`sign_keypair_internal`	❌	-	5s	-
`sign_pk_from_sk`	❌	-	8s	-
`sign_verify_internal`	❌	-	334s	-
`unpack_hints`	❌	-	5s	-
`unpack_pk`	❌	-	4s	-
`unpack_sig`	❌	-	3s	-
`unpack_sk`	❌	-	5s	-
`polyveck_pointwise_poly_montgomery`	⚠️	22s	6s	+267%

Full Results (179 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1796s	2703s	-33.6%
`polyvec_matrix_expand`	✅	193s	175s	+10%
`poly_pointwise_montgomery_c`	✅	151s	158s	-4%
`rej_uniform_native`	✅	141s	146s	-3%
`mld_invntt_layer`	✅	92s	95s	-3%
`polyvec_matrix_expand_serial`	✅	85s	80s	+6%
`polyvecl_pointwise_acc_montgomery_c`	✅	85s	281s	-70%
`mld_ct_memcmp`	✅	73s	74s	-1%
`polymat_permute_bitrev_to_custom`	✅	60s	45s	+33%
`mld_ntt_layer`	✅	55s	56s	-2%
`keccak_squeezeblocks_x4`	✅	41s	42s	-2%
`sign_signature_internal`	✅	41s	55s	-25%
`polyveck_pointwise_poly_montgomery`	⚠️	22s	6s	+267%
`rej_uniform`	✅	22s	22s	+0%
`fqmul`	✅	20s	19s	+5%
`poly_chknorm_c`	✅	20s	19s	+5%
`poly_uniform_eta_4x`	✅	19s	17s	+12%
`poly_uniform_4x`	✅	17s	17s	+0%
`mld_polyvecl_permute_bitrev_to_custom_native`	✅	16s	14s	+14%
`polyeta_unpack`	✅	15s	17s	-12%
`polyvec_matrix_pointwise_montgomery`	✅	15s	12s	+25%
`keccakf1600x4_permute_native`	✅	14s	13s	+8%
`mld_check_pct`	✅	14s	9s	+56%
`rej_uniform_c`	✅	14s	14s	+0%
`polyt0_unpack`	✅	13s	15s	-13%
`mld_ntt_butterfly_block`	✅	12s	13s	-8%
`polyveck_decompose`	✅	12s	60s	-80%
`keccak_absorb_once_x4`	✅	11s	11s	+0%
`poly_add`	✅	11s	12s	-8%
`polyveck_shiftl`	✅	10s	7s	+43%
`polyveck_use_hint`	✅	10s	13s	-23%
`keccak_absorb`	✅	9s	8s	+12%
`keccakf1600_permute_native`	✅	9s	8s	+12%
`polyveck_invntt_tomont`	✅	9s	8s	+12%
`polyveck_reduce`	✅	9s	9s	+0%
`mld_sample_s1_s2_serial`	✅	8s	9s	-11%
`poly_decompose_c`	✅	8s	7s	+14%
`polyveck_caddq`	✅	8s	8s	+0%
`polyvecl_ntt`	✅	8s	11s	-27%
`keccakf1600_permute`	✅	7s	9s	-22%
`poly_invntt_tomont_c`	✅	7s	7s	+0%
`polyveck_unpack_t0`	✅	7s	4s	+75%
`polyvecl_uniform_gamma1_serial`	✅	7s	5s	+40%
`polyz_unpack_c`	✅	7s	11s	-36%
`sign`	✅	7s	8s	-12%
`poly_caddq_c`	✅	6s	7s	-14%
`poly_ntt_native`	✅	6s	4s	+50%
`poly_uniform_gamma1_4x`	✅	6s	4s	+50%
`polyveck_ntt`	✅	6s	8s	-25%
`polyveck_sub`	✅	6s	7s	-14%
`polyvecl_chknorm`	✅	6s	6s	+0%
`sign_open`	✅	6s	7s	-14%
`keccakf1600x4_extract_bytes`	✅	5s	2s	+150%
`make_hint`	✅	5s	3s	+67%
`mld_sample_s1_s2`	✅	5s	7s	-29%
`poly_caddq_native`	✅	5s	4s	+25%
`poly_challenge`	✅	5s	4s	+25%
`poly_power2round`	✅	5s	5s	+0%
`poly_uniform_eta`	✅	5s	5s	+0%
`polyveck_unpack_eta`	✅	5s	3s	+67%
`reduce32`	✅	5s	4s	+25%
`rej_eta_native`	✅	5s	4s	+25%
`sign_signature`	✅	5s	3s	+67%
`keccak_squeeze`	✅	4s	6s	-33%
`keccakf1600_xor_bytes`	✅	4s	1s	+300%
`keccakf1600_xor_bytes (big endian)`	✅	4s	2s	+100%
`mld_ct_cmask_nonzero_u8`	✅	4s	3s	+33%
`mld_ct_get_optblocker_u8`	✅	4s	2s	+100%
`mld_ct_sel_int32`	✅	4s	2s	+100%
`mld_h`	✅	4s	4s	+0%
`mld_keccakf1600_extract_bytes`	✅	4s	4s	+0%
`mld_value_barrier_u32`	✅	4s	2s	+100%
`ntt_native_x86_64`	✅	4s	4s	+0%
`poly_caddq_native_aarch64`	✅	4s	2s	+100%
`poly_chknorm_native`	✅	4s	5s	-20%
`poly_decompose_native`	✅	4s	4s	+0%
`poly_make_hint`	✅	4s	2s	+100%
`poly_ntt`	✅	4s	4s	+0%
`poly_ntt_c`	✅	4s	3s	+33%
`poly_uniform`	✅	4s	4s	+0%
`poly_uniform_gamma1`	✅	4s	4s	+0%
`polyt1_unpack`	✅	4s	2s	+100%
`polyveck_chknorm`	✅	4s	5s	-20%
`polyvecl_pointwise_acc_montgomery_native`	✅	4s	2s	+100%
`polyvecl_unpack_z`	✅	4s	1s	+300%
`polyw1_pack`	✅	4s	4s	+0%
`polyz_unpack_native`	✅	4s	4s	+0%
`shake128_init`	✅	4s	3s	+33%
`shake256_finalize`	✅	4s	3s	+33%
`sign_signature_pre_hash_internal`	✅	4s	2s	+100%
`sign_verify_pre_hash_internal`	✅	4s	4s	+0%
`sign_verify_pre_hash_shake256`	✅	4s	4s	+0%
`sys_check_capability`	✅	4s	2s	+100%
`caddq`	✅	3s	1s	+200%
`keccak_finalize`	✅	3s	4s	-25%
`keccakf1600_extract_bytes (big endian)`	✅	3s	1s	+200%
`keccakf1600x4_permute`	✅	3s	3s	+0%
`mld_ct_cmask_nonzero_u32`	✅	3s	3s	+0%
`mld_ct_get_optblocker_i64`	✅	3s	3s	+0%
`mld_ct_get_optblocker_u32`	✅	3s	4s	-25%
`ntt_native_aarch64`	✅	3s	6s	-50%
`poly_caddq`	✅	3s	3s	+0%
`poly_chknorm_native_aarch64`	✅	3s	2s	+50%
`poly_decompose`	✅	3s	4s	-25%
`poly_reduce`	✅	3s	4s	-25%
`poly_use_hint_c`	✅	3s	3s	+0%
`poly_use_hint_native`	✅	3s	4s	-25%
`polyeta_pack`	✅	3s	3s	+0%
`polyt0_pack`	✅	3s	5s	-40%
`polyt1_pack`	✅	3s	3s	+0%
`polyveck_pack_eta`	✅	3s	3s	+0%
`polyvecl_pack_eta`	✅	3s	4s	-25%
`polyvecl_permute_bitrev_to_custom`	✅	3s	4s	-25%
`polyvecl_pointwise_acc_montgomery`	✅	3s	2s	+50%
`polyvecl_uniform_gamma1`	✅	3s	3s	+0%
`polyz_pack`	✅	3s	2s	+50%
`polyz_unpack`	✅	3s	3s	+0%
`rej_eta_c`	✅	3s	5s	-40%
`shake128_absorb`	✅	3s	3s	+0%
`shake256_absorb`	✅	3s	2s	+50%
`shake256_release`	✅	3s	5s	-40%
`shake256_squeeze`	✅	3s	3s	+0%
`sign_signature_pre_hash_shake256`	✅	3s	3s	+0%
`sign_verify`	✅	3s	5s	-40%
`decompose`	✅	2s	4s	-50%
`fqscale`	✅	2s	4s	-50%
`intt_native_x86_64`	✅	2s	3s	-33%
`keccak_init`	✅	2s	4s	-50%
`keccakf1600x4_xor_bytes`	✅	2s	3s	-33%
`mld_ct_abs_i32`	✅	2s	2s	+0%
`mld_ct_cmask_neg_i32`	✅	2s	3s	-33%
`mld_prepare_domain_separation_prefix`	✅	2s	7s	-71%
`mld_value_barrier_i64`	✅	2s	2s	+0%
`mld_value_barrier_u8`	✅	2s	4s	-50%
`montgomery_reduce`	✅	2s	3s	-33%
`poly_chknorm`	✅	2s	1s	+100%
`poly_invntt_tomont`	✅	2s	3s	-33%
`poly_invntt_tomont_native`	✅	2s	3s	-33%
`poly_pointwise_montgomery`	✅	2s	3s	-33%
`poly_pointwise_montgomery_native`	✅	2s	3s	-33%
`poly_shiftl`	✅	2s	3s	-33%
`poly_sub`	✅	2s	3s	-33%
`poly_use_hint`	✅	2s	2s	+0%
`polyveck_pack_w1`	✅	2s	4s	-50%
`polyvecl_unpack_eta`	✅	2s	4s	-50%
`power2round`	✅	2s	3s	-33%
`rej_eta`	✅	2s	4s	-50%
`shake128_finalize`	✅	2s	2s	+0%
`shake128_release`	✅	2s	4s	-50%
`shake128_squeeze`	✅	2s	2s	+0%
`shake128x4_absorb_once`	✅	2s	2s	+0%
`shake128x4_squeezeblocks`	✅	2s	2s	+0%
`shake256`	✅	2s	3s	-33%
`shake256_init`	✅	2s	1s	+100%
`shake256x4_absorb_once`	✅	2s	2s	+0%
`shake256x4_squeezeblocks`	✅	2s	3s	-33%
`sign_keypair`	✅	2s	2s	+0%
`sign_signature_extmu`	✅	2s	5s	-60%
`sign_verify_extmu`	✅	2s	3s	-33%
`use_hint`	✅	2s	3s	-33%
`mld_attempt_signature_generation`	❌	-	240s	-
`mld_compute_pack_z`	❌	-	7s	-
`mld_compute_t0_t1_tr_from_sk_components`	❌	-	25s	-
`pack_pk`	❌	-	5s	-
`pack_sig_c_h`	❌	-	3s	-
`pack_sig_z`	❌	-	3s	-
`pack_sk`	❌	-	3s	-
`polyveck_add`	❌	-	9s	-
`polyveck_make_hint`	❌	-	8s	-
`polyveck_pack_t0`	❌	-	4s	-
`polyveck_pointwise_poly_montgomery_s2`	❌	-	-	-
`polyveck_pointwise_poly_montgomery_t0`	❌	-	-	-
`polyveck_power2round`	❌	-	10s	-
`sign_keypair_internal`	❌	-	5s	-
`sign_pk_from_sk`	❌	-	8s	-
`sign_verify_internal`	❌	-	334s	-
`unpack_hints`	❌	-	5s	-
`unpack_pk`	❌	-	4s	-
`unpack_sig`	❌	-	3s	-
`unpack_sk`	❌	-	5s	-

oqs-bot · 2026-03-27T05:39:45Z

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof	Status	Current	Previous	Change
`mld_attempt_signature_generation`	❌	-	234s	-
`mld_compute_pack_z`	❌	-	6s	-
`mld_compute_t0_t1_tr_from_sk_components`	❌	-	13s	-
`pack_pk`	❌	-	7s	-
`pack_sig_c_h`	❌	-	2s	-
`pack_sig_z`	❌	-	3s	-
`pack_sk`	❌	-	2s	-
`polyveck_add`	❌	-	5s	-
`polyveck_make_hint`	❌	-	2s	-
`polyveck_pack_t0`	❌	-	4s	-
`polyveck_pointwise_poly_montgomery_s2`	❌	-	-	-
`polyveck_pointwise_poly_montgomery_t0`	❌	-	-	-
`polyveck_power2round`	❌	-	14s	-
`sign_keypair_internal`	❌	-	5s	-
`sign_pk_from_sk`	❌	-	7s	-
`sign_verify_internal`	❌	-	127s	-
`unpack_hints`	❌	-	6s	-
`unpack_pk`	❌	-	2s	-
`unpack_sig`	❌	-	2s	-
`unpack_sk`	❌	-	5s	-
`polyveck_invntt_tomont`	⚠️	20s	3s	+567%

Full Results (179 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1462s	2066s	-29.2%
`polyvecl_pointwise_acc_montgomery_c`	✅	189s	233s	-19%
`poly_pointwise_montgomery_c`	✅	144s	162s	-11%
`rej_uniform_native`	✅	136s	145s	-6%
`mld_invntt_layer`	✅	85s	86s	-1%
`mld_ct_memcmp`	✅	69s	82s	-16%
`mld_ntt_layer`	✅	53s	57s	-7%
`keccak_squeezeblocks_x4`	✅	41s	42s	-2%
`fqmul`	✅	20s	21s	-5%
`polyveck_invntt_tomont`	⚠️	20s	3s	+567%
`rej_uniform`	✅	20s	22s	-9%
`poly_chknorm_c`	✅	19s	23s	-17%
`polyvec_matrix_expand`	✅	18s	28s	-36%
`polyeta_unpack`	✅	17s	18s	-6%
`sign_signature_internal`	✅	17s	33s	-48%
`polyt0_unpack`	✅	16s	17s	-6%
`rej_uniform_c`	✅	15s	18s	-17%
`poly_uniform_eta_4x`	✅	14s	18s	-22%
`polyz_unpack_c`	✅	14s	11s	+27%
`mld_ntt_butterfly_block`	✅	13s	14s	-7%
`poly_uniform_4x`	✅	13s	14s	-7%
`keccakf1600x4_permute_native`	✅	12s	12s	+0%
`poly_add`	✅	12s	13s	-8%
`polyvec_matrix_pointwise_montgomery`	✅	12s	15s	-20%
`keccak_absorb_once_x4`	✅	9s	10s	-10%
`keccakf1600_permute`	✅	8s	7s	+14%
`keccakf1600_permute_native`	✅	8s	7s	+14%
`mld_check_pct`	✅	8s	8s	+0%
`poly_invntt_tomont_c`	✅	8s	7s	+14%
`keccak_absorb`	✅	7s	9s	-22%
`mld_h`	✅	7s	6s	+17%
`polymat_permute_bitrev_to_custom`	✅	7s	16s	-56%
`polyvec_matrix_expand_serial`	✅	7s	11s	-36%
`polyveck_sub`	✅	7s	4s	+75%
`sign_open`	✅	7s	5s	+40%
`sign_signature_pre_hash_internal`	✅	7s	4s	+75%
`mld_polyvecl_permute_bitrev_to_custom_native`	✅	6s	10s	-40%
`poly_decompose_native`	✅	6s	3s	+100%
`poly_uniform`	✅	6s	5s	+20%
`polyveck_chknorm`	✅	6s	7s	-14%
`polyveck_decompose`	✅	6s	7s	-14%
`polyveck_reduce`	✅	6s	5s	+20%
`polyvecl_ntt`	✅	6s	3s	+100%
`polyz_unpack`	✅	6s	6s	+0%
`sign`	✅	6s	7s	-14%
`mld_prepare_domain_separation_prefix`	✅	5s	6s	-17%
`poly_reduce`	✅	5s	4s	+25%
`polyveck_caddq`	✅	5s	7s	-29%
`polyveck_ntt`	✅	5s	4s	+25%
`polyveck_use_hint`	✅	5s	5s	+0%
`polyvecl_chknorm`	✅	5s	3s	+67%
`decompose`	✅	4s	4s	+0%
`fqscale`	✅	4s	3s	+33%
`make_hint`	✅	4s	2s	+100%
`mld_ct_cmask_nonzero_u32`	✅	4s	3s	+33%
`mld_ct_cmask_nonzero_u8`	✅	4s	3s	+33%
`mld_ct_get_optblocker_u8`	✅	4s	2s	+100%
`mld_sample_s1_s2_serial`	✅	4s	5s	-20%
`montgomery_reduce`	✅	4s	3s	+33%
`ntt_native_aarch64`	✅	4s	3s	+33%
`poly_caddq_c`	✅	4s	6s	-33%
`poly_caddq_native`	✅	4s	4s	+0%
`poly_caddq_native_aarch64`	✅	4s	3s	+33%
`poly_challenge`	✅	4s	5s	-20%
`poly_decompose_c`	✅	4s	3s	+33%
`poly_ntt`	✅	4s	4s	+0%
`poly_ntt_native`	✅	4s	3s	+33%
`poly_power2round`	✅	4s	6s	-33%
`poly_shiftl`	✅	4s	4s	+0%
`poly_sub`	✅	4s	4s	+0%
`poly_use_hint_c`	✅	4s	5s	-20%
`polyeta_pack`	✅	4s	4s	+0%
`polyt0_pack`	✅	4s	4s	+0%
`polyveck_shiftl`	✅	4s	6s	-33%
`polyveck_unpack_eta`	✅	4s	3s	+33%
`polyveck_unpack_t0`	✅	4s	5s	-20%
`polyvecl_permute_bitrev_to_custom`	✅	4s	3s	+33%
`polyvecl_unpack_z`	✅	4s	3s	+33%
`rej_eta_c`	✅	4s	4s	+0%
`shake128_release`	✅	4s	4s	+0%
`shake128_squeeze`	✅	4s	3s	+33%
`shake128x4_absorb_once`	✅	4s	2s	+100%
`sign_keypair`	✅	4s	3s	+33%
`sign_signature_pre_hash_shake256`	✅	4s	2s	+100%
`sign_verify`	✅	4s	7s	-43%
`intt_native_x86_64`	✅	3s	5s	-40%
`keccak_init`	✅	3s	2s	+50%
`keccak_squeeze`	✅	3s	4s	-25%
`keccakf1600_xor_bytes`	✅	3s	4s	-25%
`keccakf1600x4_extract_bytes`	✅	3s	2s	+50%
`keccakf1600x4_xor_bytes`	✅	3s	3s	+0%
`mld_ct_abs_i32`	✅	3s	3s	+0%
`mld_ct_cmask_neg_i32`	✅	3s	3s	+0%
`mld_sample_s1_s2`	✅	3s	3s	+0%
`mld_value_barrier_i64`	✅	3s	2s	+50%
`poly_chknorm_native`	✅	3s	3s	+0%
`poly_decompose`	✅	3s	3s	+0%
`poly_invntt_tomont_native`	✅	3s	4s	-25%
`poly_pointwise_montgomery_native`	✅	3s	4s	-25%
`poly_uniform_eta`	✅	3s	5s	-40%
`poly_uniform_gamma1`	✅	3s	3s	+0%
`poly_uniform_gamma1_4x`	✅	3s	4s	-25%
`poly_use_hint`	✅	3s	2s	+50%
`poly_use_hint_native`	✅	3s	2s	+50%
`polyt1_pack`	✅	3s	2s	+50%
`polyt1_unpack`	✅	3s	5s	-40%
`polyveck_pack_eta`	✅	3s	3s	+0%
`polyveck_pack_w1`	✅	3s	3s	+0%
`polyveck_pointwise_poly_montgomery`	✅	3s	4s	-25%
`polyvecl_pack_eta`	✅	3s	6s	-50%
`polyvecl_uniform_gamma1_serial`	✅	3s	4s	-25%
`polyw1_pack`	✅	3s	3s	+0%
`polyz_pack`	✅	3s	3s	+0%
`reduce32`	✅	3s	3s	+0%
`rej_eta_native`	✅	3s	4s	-25%
`shake128_absorb`	✅	3s	2s	+50%
`shake128_init`	✅	3s	2s	+50%
`shake256`	✅	3s	2s	+50%
`shake256_squeeze`	✅	3s	2s	+50%
`shake256x4_absorb_once`	✅	3s	2s	+50%
`shake256x4_squeezeblocks`	✅	3s	1s	+200%
`sign_signature`	✅	3s	2s	+50%
`sign_signature_extmu`	✅	3s	3s	+0%
`sign_verify_extmu`	✅	3s	5s	-40%
`sign_verify_pre_hash_internal`	✅	3s	3s	+0%
`sign_verify_pre_hash_shake256`	✅	3s	3s	+0%
`use_hint`	✅	3s	4s	-25%
`keccak_finalize`	✅	2s	2s	+0%
`keccakf1600_extract_bytes (big endian)`	✅	2s	4s	-50%
`keccakf1600_xor_bytes (big endian)`	✅	2s	2s	+0%
`keccakf1600x4_permute`	✅	2s	5s	-60%
`mld_keccakf1600_extract_bytes`	✅	2s	3s	-33%
`mld_value_barrier_u32`	✅	2s	2s	+0%
`mld_value_barrier_u8`	✅	2s	5s	-60%
`ntt_native_x86_64`	✅	2s	2s	+0%
`poly_caddq`	✅	2s	3s	-33%
`poly_invntt_tomont`	✅	2s	4s	-50%
`poly_ntt_c`	✅	2s	3s	-33%
`poly_pointwise_montgomery`	✅	2s	4s	-50%
`polyvecl_pointwise_acc_montgomery`	✅	2s	3s	-33%
`polyvecl_pointwise_acc_montgomery_native`	✅	2s	7s	-71%
`polyvecl_uniform_gamma1`	✅	2s	3s	-33%
`polyvecl_unpack_eta`	✅	2s	3s	-33%
`polyz_unpack_native`	✅	2s	3s	-33%
`power2round`	✅	2s	2s	+0%
`rej_eta`	✅	2s	2s	+0%
`shake128_finalize`	✅	2s	2s	+0%
`shake128x4_squeezeblocks`	✅	2s	2s	+0%
`shake256_finalize`	✅	2s	3s	-33%
`shake256_release`	✅	2s	2s	+0%
`sys_check_capability`	✅	2s	3s	-33%
`mld_attempt_signature_generation`	❌	-	234s	-
`mld_compute_pack_z`	❌	-	6s	-
`mld_compute_t0_t1_tr_from_sk_components`	❌	-	13s	-
`pack_pk`	❌	-	7s	-
`pack_sig_c_h`	❌	-	2s	-
`pack_sig_z`	❌	-	3s	-
`pack_sk`	❌	-	2s	-
`polyveck_add`	❌	-	5s	-
`polyveck_make_hint`	❌	-	2s	-
`polyveck_pack_t0`	❌	-	4s	-
`polyveck_pointwise_poly_montgomery_s2`	❌	-	-	-
`polyveck_pointwise_poly_montgomery_t0`	❌	-	-	-
`polyveck_power2round`	❌	-	14s	-
`sign_keypair_internal`	❌	-	5s	-
`sign_pk_from_sk`	❌	-	7s	-
`sign_verify_internal`	❌	-	127s	-
`unpack_hints`	❌	-	6s	-
`unpack_pk`	❌	-	2s	-
`unpack_sig`	❌	-	2s	-
`unpack_sk`	❌	-	5s	-
`caddq`	✅	1s	5s	-80%
`mld_ct_get_optblocker_i64`	✅	1s	3s	-67%
`mld_ct_get_optblocker_u32`	✅	1s	2s	-50%
`mld_ct_sel_int32`	✅	1s	2s	-50%
`poly_chknorm`	✅	1s	2s	-50%
`poly_chknorm_native_aarch64`	✅	1s	3s	-67%
`poly_make_hint`	✅	1s	2s	-50%
`shake256_absorb`	✅	1s	2s	-50%
`shake256_init`	✅	1s	2s	-50%

Introduce mld_s1vec, following the same pattern as mld_polymat for reduced RAM usage. In normal mode, it stores the full NTT'd polyvecl. In REDUCE_RAM mode, it stores a pointer to the packed s1 data in the secret key and unpacks + NTTs individual polynomials on demand. This reduces signing memory in REDUCE_RAM mode: - ML-DSA-44: 32,448 -> 28,384 (-4,064 bytes) - ML-DSA-65: 44,768 -> 39,680 (-5,088 bytes) - ML-DSA-87: 59,104 -> 51,968 (-7,136 bytes) Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Same pattern as mld_s1vec: in normal mode stores the full NTT'd polyveck, in REDUCE_RAM mode stores a pointer and unpacks + NTTs on demand. REDUCE_RAM signing memory reduction: - ML-DSA-44: 28,384 -> 24,320 (-4,064 bytes) - ML-DSA-65: 39,680 -> 33,568 (-6,112 bytes) - ML-DSA-87: 51,968 -> 43,808 (-8,160 bytes) Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Same pattern as mld_s1vec and mld_s2vec: in normal mode stores the full NTT'd polyveck, in REDUCE_RAM mode stores a pointer and unpacks + NTTs on demand. REDUCE_RAM signing memory reduction: - ML-DSA-44: 24,320 -> 20,256 (-4,064 bytes) - ML-DSA-65: 33,568 -> 27,456 (-6,112 bytes) - ML-DSA-87: 43,808 -> 35,648 (-8,160 bytes) Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Instead of allocating a full polyveck for h in attempt_signature_generation, compute cs2, ct0, and hints one polynomial at a time using scratch polys. This eliminates the polyveck h from the yh_u union, replacing mld_pack_sig_c_h with incremental packing via mld_pack_sig_c, mld_pack_sig_h_init, and mld_pack_sig_h_poly. Sign allocation savings (normal / REDUCE_RAM): - ML-DSA-44: -4096 / 0 bytes - ML-DSA-65: -6144 / -1024 bytes - ML-DSA-87: -8192 / -1024 bytes Note: CBMC proofs are not updated yet. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

In REDUCE_RAM mode, shrink mld_polymat from rho + row_buffer (L polys) to rho + poly_buffer (1 poly). Replace mld_polymat_get_row with mld_polymat_get_element that samples a single A[k][l] on demand. Rewrite mld_polyvec_matrix_pointwise_montgomery in REDUCE_RAM mode to use per-element access, accumulating A[k][l] * v[l] one element at a time. Normal mode is unchanged (full matrix, row-based access). REDUCE_RAM allocation savings: - ML-DSA-44: keypair -3072, sign -3072, verify -3072, pk_from_sk -3072 - ML-DSA-65: keypair -4096, sign -4096, verify -4096, pk_from_sk -4096 - ML-DSA-87: keypair -6144, sign -6144, verify -6144, pk_from_sk -6144 Note: CBMC proofs are not updated yet. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Introduce mld_yvec type following the same pattern as mld_s1vec/s2vec/t0vec: in normal mode it holds the full polyvecl, in REDUCE_RAM mode it stores only the seed (rhoprime) and nonce for on-demand regeneration. Add mld_polyvec_matrix_pointwise_montgomery_yvec which computes w = invNTT(A * NTT(y)). In REDUCE_RAM mode it fuses y sampling with column-by-column matrix multiplication, avoiding storage of y entirely. In normal mode it delegates to the existing bulk path. Also enable mld_poly_uniform_gamma1 for REDUCE_RAM builds so the per-poly y regeneration works for all parameter sets. REDUCE_RAM sign allocation savings: - ML-DSA-44: 17184 -> 13120 (-4064 bytes) - ML-DSA-65: 22336 -> 17248 (-5088 bytes) - ML-DSA-87: 28480 -> 21344 (-7136 bytes) Normal mode is unchanged. Note: CBMC proofs are not updated yet. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Replace mld_compute_t0_t1_tr_from_sk_components with per-row mld_compute_t0k_t1k. Both keygen and pk_from_sk now process one row at a time, packing t1[k] into pk and t0[k] into sk immediately. This eliminates full polyveck allocations for t0, t1, and the matrix from both code paths. Note: CBMC proofs are not updated yet. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

To silence linting errors. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

oqs-bot · 2026-03-27T06:06:42Z

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof	Status	Current	Previous	Change
`mld_attempt_signature_generation`	❌	-	278s	-
`mld_compute_pack_z`	❌	-	8s	-
`mld_compute_t0_t1_tr_from_sk_components`	❌	-	27s	-
`pack_pk`	❌	-	3s	-
`pack_sig_c_h`	❌	-	2s	-
`pack_sig_z`	❌	-	3s	-
`pack_sk`	❌	-	5s	-
`polyveck_add`	❌	-	7s	-
`polyveck_make_hint`	❌	-	6s	-
`polyveck_pack_t0`	❌	-	4s	-
`polyveck_pointwise_poly_montgomery_s2`	❌	-	-	-
`polyveck_pointwise_poly_montgomery_t0`	❌	-	-	-
`polyveck_power2round`	❌	-	11s	-
`polyvecl_pointwise_acc_montgomery_c`	❌	-	190s	-
`sign_keypair_internal`	❌	-	6s	-
`sign_pk_from_sk`	❌	-	7s	-
`sign_verify_internal`	❌	-	341s	-
`unpack_hints`	❌	-	6s	-
`unpack_pk`	❌	-	3s	-
`unpack_sig`	❌	-	2s	-
`unpack_sk`	❌	-	6s	-
`polyvecl_chknorm`	⚠️	30s	13s	+131%

Full Results (179 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1604s	2488s	-35.5%
`poly_pointwise_montgomery_c`	✅	170s	160s	+6%
`rej_uniform_native`	✅	151s	145s	+4%
`polyvec_matrix_expand`	✅	125s	128s	-2%
`mld_invntt_layer`	✅	100s	96s	+4%
`mld_ct_memcmp`	✅	83s	77s	+8%
`mld_ntt_layer`	✅	61s	54s	+13%
`polyvec_matrix_expand_serial`	✅	56s	70s	-20%
`keccak_squeezeblocks_x4`	✅	46s	42s	+10%
`polyvecl_chknorm`	⚠️	30s	13s	+131%
`rej_uniform`	✅	25s	23s	+9%
`sign_signature_internal`	✅	24s	37s	-35%
`poly_chknorm_c`	✅	22s	22s	+0%
`fqmul`	✅	21s	18s	+17%
`poly_uniform_4x`	✅	18s	15s	+20%
`polymat_permute_bitrev_to_custom`	✅	18s	29s	-38%
`poly_uniform_eta_4x`	✅	16s	16s	+0%
`polyt0_unpack`	✅	16s	14s	+14%
`rej_uniform_c`	✅	16s	17s	-6%
`polyvec_matrix_pointwise_montgomery`	✅	14s	11s	+27%
`mld_ntt_butterfly_block`	✅	13s	13s	+0%
`mld_polyvecl_permute_bitrev_to_custom_native`	✅	13s	8s	+62%
`keccakf1600_permute_native`	✅	12s	9s	+33%
`keccakf1600x4_permute_native`	✅	12s	13s	-8%
`polyveck_shiftl`	✅	12s	8s	+50%
`poly_add`	✅	11s	11s	+0%
`keccak_absorb_once_x4`	✅	10s	9s	+11%
`poly_decompose_c`	✅	10s	7s	+43%
`polyveck_decompose`	✅	10s	14s	-29%
`keccakf1600_permute`	✅	9s	10s	-10%
`mld_check_pct`	✅	9s	8s	+12%
`polyveck_invntt_tomont`	✅	9s	10s	-10%
`polyveck_use_hint`	✅	9s	8s	+12%
`keccak_absorb`	✅	8s	7s	+14%
`poly_invntt_tomont_c`	✅	8s	10s	-20%
`polyveck_sub`	✅	8s	10s	-20%
`polyvecl_ntt`	✅	8s	8s	+0%
`sign`	✅	8s	8s	+0%
`sign_signature_pre_hash_internal`	✅	8s	4s	+100%
`polyt1_unpack`	✅	7s	4s	+75%
`polyveck_caddq`	✅	7s	7s	+0%
`polyveck_chknorm`	✅	7s	5s	+40%
`polyveck_ntt`	✅	7s	11s	-36%
`polyveck_reduce`	✅	7s	6s	+17%
`mld_prepare_domain_separation_prefix`	✅	6s	7s	-14%
`mld_sample_s1_s2`	✅	6s	6s	+0%
`poly_challenge`	✅	6s	5s	+20%
`polyveck_pointwise_poly_montgomery`	✅	6s	6s	+0%
`intt_native_x86_64`	✅	5s	3s	+67%
`mld_sample_s1_s2_serial`	✅	5s	6s	-17%
`poly_caddq_c`	✅	5s	4s	+25%
`poly_shiftl`	✅	5s	3s	+67%
`poly_uniform_gamma1_4x`	✅	5s	3s	+67%
`poly_use_hint_c`	✅	5s	3s	+67%
`polyt0_pack`	✅	5s	5s	+0%
`polyvecl_pack_eta`	✅	5s	2s	+150%
`polyvecl_pointwise_acc_montgomery_native`	✅	5s	3s	+67%
`polyvecl_uniform_gamma1`	✅	5s	2s	+150%
`polyvecl_uniform_gamma1_serial`	✅	5s	5s	+0%
`polyvecl_unpack_z`	✅	5s	3s	+67%
`rej_eta`	✅	5s	2s	+150%
`rej_eta_c`	✅	5s	4s	+25%
`rej_eta_native`	✅	5s	4s	+25%
`sign_signature`	✅	5s	6s	-17%
`sign_verify_pre_hash_internal`	✅	5s	6s	-17%
`use_hint`	✅	5s	4s	+25%
`keccakf1600_xor_bytes`	✅	4s	1s	+300%
`keccakf1600_xor_bytes (big endian)`	✅	4s	2s	+100%
`mld_h`	✅	4s	5s	-20%
`mld_keccakf1600_extract_bytes`	✅	4s	2s	+100%
`poly_caddq_native`	✅	4s	3s	+33%
`poly_chknorm_native`	✅	4s	3s	+33%
`poly_ntt`	✅	4s	4s	+0%
`poly_pointwise_montgomery_native`	✅	4s	4s	+0%
`poly_power2round`	✅	4s	4s	+0%
`poly_sub`	✅	4s	4s	+0%
`poly_uniform_eta`	✅	4s	6s	-33%
`poly_use_hint_native`	✅	4s	3s	+33%
`polyeta_unpack`	✅	4s	6s	-33%
`polyveck_pack_eta`	✅	4s	4s	+0%
`polyveck_pack_w1`	✅	4s	4s	+0%
`polyveck_unpack_t0`	✅	4s	3s	+33%
`shake128_init`	✅	4s	1s	+300%
`shake256_finalize`	✅	4s	2s	+100%
`sign_keypair`	✅	4s	2s	+100%
`sign_verify`	✅	4s	4s	+0%
`sign_verify_extmu`	✅	4s	3s	+33%
`fqscale`	✅	3s	3s	+0%
`keccak_finalize`	✅	3s	3s	+0%
`keccak_init`	✅	3s	1s	+200%
`keccak_squeeze`	✅	3s	4s	-25%
`make_hint`	✅	3s	2s	+50%
`mld_ct_cmask_neg_i32`	✅	3s	1s	+200%
`mld_ct_cmask_nonzero_u8`	✅	3s	3s	+0%
`mld_ct_sel_int32`	✅	3s	5s	-40%
`montgomery_reduce`	✅	3s	3s	+0%
`ntt_native_x86_64`	✅	3s	4s	-25%
`poly_caddq_native_aarch64`	✅	3s	4s	-25%
`poly_chknorm_native_aarch64`	✅	3s	4s	-25%
`poly_decompose_native`	✅	3s	5s	-40%
`poly_ntt_c`	✅	3s	3s	+0%
`poly_reduce`	✅	3s	2s	+50%
`poly_uniform`	✅	3s	2s	+50%
`poly_use_hint`	✅	3s	3s	+0%
`polyeta_pack`	✅	3s	2s	+50%
`polyt1_pack`	✅	3s	3s	+0%
`polyveck_unpack_eta`	✅	3s	5s	-40%
`polyvecl_pointwise_acc_montgomery`	✅	3s	4s	-25%
`polyw1_pack`	✅	3s	4s	-25%
`polyz_pack`	✅	3s	3s	+0%
`polyz_unpack_c`	✅	3s	2s	+50%
`power2round`	✅	3s	1s	+200%
`shake128_absorb`	✅	3s	2s	+50%
`shake128_finalize`	✅	3s	3s	+0%
`shake128x4_squeezeblocks`	✅	3s	2s	+50%
`shake256`	✅	3s	2s	+50%
`shake256_release`	✅	3s	3s	+0%
`shake256x4_absorb_once`	✅	3s	4s	-25%
`sign_open`	✅	3s	5s	-40%
`sign_signature_extmu`	✅	3s	4s	-25%
`sign_signature_pre_hash_shake256`	✅	3s	5s	-40%
`sys_check_capability`	✅	3s	1s	+200%
`caddq`	✅	2s	3s	-33%
`decompose`	✅	2s	3s	-33%
`keccakf1600_extract_bytes (big endian)`	✅	2s	3s	-33%
`keccakf1600x4_extract_bytes`	✅	2s	3s	-33%
`keccakf1600x4_permute`	✅	2s	4s	-50%
`keccakf1600x4_xor_bytes`	✅	2s	2s	+0%
`mld_ct_abs_i32`	✅	2s	2s	+0%
`mld_ct_cmask_nonzero_u32`	✅	2s	3s	-33%
`mld_ct_get_optblocker_u8`	✅	2s	2s	+0%
`mld_value_barrier_i64`	✅	2s	3s	-33%
`mld_value_barrier_u32`	✅	2s	3s	-33%
`ntt_native_aarch64`	✅	2s	3s	-33%
`poly_caddq`	✅	2s	3s	-33%
`poly_chknorm`	✅	2s	3s	-33%
`poly_decompose`	✅	2s	3s	-33%
`poly_invntt_tomont`	✅	2s	2s	+0%
`poly_invntt_tomont_native`	✅	2s	4s	-50%
`poly_make_hint`	✅	2s	3s	-33%
`poly_ntt_native`	✅	2s	2s	+0%
`poly_pointwise_montgomery`	✅	2s	4s	-50%
`poly_uniform_gamma1`	✅	2s	4s	-50%
`polyvecl_permute_bitrev_to_custom`	✅	2s	3s	-33%
`polyvecl_unpack_eta`	✅	2s	4s	-50%
`polyz_unpack`	✅	2s	2s	+0%
`polyz_unpack_native`	✅	2s	2s	+0%
`reduce32`	✅	2s	2s	+0%
`shake128x4_absorb_once`	✅	2s	3s	-33%
`shake256_squeeze`	✅	2s	4s	-50%
`shake256x4_squeezeblocks`	✅	2s	2s	+0%
`sign_verify_pre_hash_shake256`	✅	2s	5s	-60%
`mld_attempt_signature_generation`	❌	-	278s	-
`mld_compute_pack_z`	❌	-	8s	-
`mld_compute_t0_t1_tr_from_sk_components`	❌	-	27s	-
`pack_pk`	❌	-	3s	-
`pack_sig_c_h`	❌	-	2s	-
`pack_sig_z`	❌	-	3s	-
`pack_sk`	❌	-	5s	-
`polyveck_add`	❌	-	7s	-
`polyveck_make_hint`	❌	-	6s	-
`polyveck_pack_t0`	❌	-	4s	-
`polyveck_pointwise_poly_montgomery_s2`	❌	-	-	-
`polyveck_pointwise_poly_montgomery_t0`	❌	-	-	-
`polyveck_power2round`	❌	-	11s	-
`polyvecl_pointwise_acc_montgomery_c`	❌	-	190s	-
`sign_keypair_internal`	❌	-	6s	-
`sign_pk_from_sk`	❌	-	7s	-
`sign_verify_internal`	❌	-	341s	-
`unpack_hints`	❌	-	6s	-
`unpack_pk`	❌	-	3s	-
`unpack_sig`	❌	-	2s	-
`unpack_sk`	❌	-	6s	-
`mld_ct_get_optblocker_i64`	✅	1s	2s	-50%
`mld_ct_get_optblocker_u32`	✅	1s	1s	+0%
`mld_value_barrier_u8`	✅	1s	3s	-67%
`shake128_release`	✅	1s	1s	+0%
`shake128_squeeze`	✅	1s	4s	-75%
`shake256_absorb`	✅	1s	6s	-83%
`shake256_init`	✅	1s	3s	-67%

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

oqs-bot

Intel Xeon 4th gen (c7i)

Details

Benchmark suite	Current: `b56df8e`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`34222` cycles	`34508` cycles	`0.99`
`ML-DSA-44 sign`	`120003` cycles	`119762` cycles	`1.00`
`ML-DSA-44 verify`	`38274` cycles	`38106` cycles	`1.00`
`ML-DSA-65 keypair`	`58946` cycles	`61327` cycles	`0.96`
`ML-DSA-65 sign`	`198396` cycles	`202109` cycles	`0.98`
`ML-DSA-65 verify`	`63064` cycles	`62771` cycles	`1.00`
`ML-DSA-87 keypair`	`92076` cycles	`94593` cycles	`0.97`
`ML-DSA-87 sign`	`242172` cycles	`240827` cycles	`1.01`
`ML-DSA-87 verify`	`96085` cycles	`96019` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 4th gen (c7i) (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`95415` cycles	`93753` cycles	`1.02`
`ML-DSA-44 sign`	`331798` cycles	`333304` cycles	`1.00`
`ML-DSA-44 verify`	`99519` cycles	`99738` cycles	`1.00`
`ML-DSA-65 keypair`	`161336` cycles	`159678` cycles	`1.01`
`ML-DSA-65 sign`	`539375` cycles	`544024` cycles	`0.99`
`ML-DSA-65 verify`	`162895` cycles	`160787` cycles	`1.01`
`ML-DSA-87 keypair`	`268110` cycles	`267177` cycles	`1.00`
`ML-DSA-87 sign`	`705391` cycles	`705890` cycles	`1.00`
`ML-DSA-87 verify`	`268591` cycles	`270246` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`112362` cycles	`113139` cycles	`0.99`
`ML-DSA-44 sign`	`356672` cycles	`355421` cycles	`1.00`
`ML-DSA-44 verify`	`117719` cycles	`117817` cycles	`1.00`
`ML-DSA-65 keypair`	`194918` cycles	`196421` cycles	`0.99`
`ML-DSA-65 sign`	`586360` cycles	`588818` cycles	`1.00`
`ML-DSA-65 verify`	`194819` cycles	`194511` cycles	`1.00`
`ML-DSA-87 keypair`	`319997` cycles	`322254` cycles	`0.99`
`ML-DSA-87 sign`	`751619` cycles	`752975` cycles	`1.00`
`ML-DSA-87 verify`	`319902` cycles	`320113` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`68388` cycles	`68974` cycles	`0.99`
`ML-DSA-44 sign`	`189267` cycles	`187318` cycles	`1.01`
`ML-DSA-44 verify`	`69139` cycles	`69050` cycles	`1.00`
`ML-DSA-65 keypair`	`118689` cycles	`119428` cycles	`0.99`
`ML-DSA-65 sign`	`301252` cycles	`300617` cycles	`1.00`
`ML-DSA-65 verify`	`115747` cycles	`115643` cycles	`1.00`
`ML-DSA-87 keypair`	`201697` cycles	`203571` cycles	`0.99`
`ML-DSA-87 sign`	`393413` cycles	`394649` cycles	`1.00`
`ML-DSA-87 verify`	`194483` cycles	`195659` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`55881` cycles	`56817` cycles	`0.98`
`ML-DSA-44 sign`	`180718` cycles	`182410` cycles	`0.99`
`ML-DSA-44 verify`	`61359` cycles	`61615` cycles	`1.00`
`ML-DSA-65 keypair`	`97413` cycles	`98729` cycles	`0.99`
`ML-DSA-65 sign`	`296886` cycles	`298290` cycles	`1.00`
`ML-DSA-65 verify`	`101441` cycles	`100286` cycles	`1.01`
`ML-DSA-87 keypair`	`150700` cycles	`152586` cycles	`0.99`
`ML-DSA-87 sign`	`354653` cycles	`355720` cycles	`1.00`
`ML-DSA-87 verify`	`154075` cycles	`153499` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a) (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`134214` cycles	`134983` cycles	`0.99`
`ML-DSA-44 sign`	`524720` cycles	`524482` cycles	`1.00`
`ML-DSA-44 verify`	`147384` cycles	`147385` cycles	`1.00`
`ML-DSA-65 keypair`	`226870` cycles	`228309` cycles	`0.99`
`ML-DSA-65 sign`	`854441` cycles	`864340` cycles	`0.99`
`ML-DSA-65 verify`	`236415` cycles	`236413` cycles	`1.00`
`ML-DSA-87 keypair`	`368665` cycles	`370688` cycles	`0.99`
`ML-DSA-87 sign`	`1068488` cycles	`1079564` cycles	`0.99`
`ML-DSA-87 verify`	`382091` cycles	`383220` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`39871` cycles	`42279` cycles	`0.94`
`ML-DSA-44 sign`	`136504` cycles	`132300` cycles	`1.03`
`ML-DSA-44 verify`	`44253` cycles	`43971` cycles	`1.01`
`ML-DSA-65 keypair`	`71924` cycles	`76769` cycles	`0.94`
`ML-DSA-65 sign`	`213770` cycles	`217452` cycles	`0.98`
`ML-DSA-65 verify`	`72509` cycles	`73895` cycles	`0.98`
`ML-DSA-87 keypair`	`108439` cycles	`108025` cycles	`1.00`
`ML-DSA-87 sign`	`251417` cycles	`252354` cycles	`1.00`
`ML-DSA-87 verify`	`109165` cycles	`109188` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 sign`	`136504` cycles	`132300` cycles	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`211753` cycles	`212555` cycles	`1.00`
`ML-DSA-44 sign`	`758883` cycles	`759099` cycles	`1.00`
`ML-DSA-44 verify`	`229118` cycles	`228906` cycles	`1.00`
`ML-DSA-65 keypair`	`377189` cycles	`380502` cycles	`0.99`
`ML-DSA-65 sign`	`1247155` cycles	`1251648` cycles	`1.00`
`ML-DSA-65 verify`	`371410` cycles	`372262` cycles	`1.00`
`ML-DSA-87 keypair`	`603210` cycles	`604945` cycles	`1.00`
`ML-DSA-87 sign`	`1585138` cycles	`1590686` cycles	`1.00`
`ML-DSA-87 verify`	`618819` cycles	`616948` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i) (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`156948` cycles	`157614` cycles	`1.00`
`ML-DSA-44 sign`	`548292` cycles	`551534` cycles	`0.99`
`ML-DSA-44 verify`	`169377` cycles	`169123` cycles	`1.00`
`ML-DSA-65 keypair`	`266042` cycles	`267907` cycles	`0.99`
`ML-DSA-65 sign`	`891894` cycles	`904333` cycles	`0.99`
`ML-DSA-65 verify`	`274396` cycles	`275011` cycles	`1.00`
`ML-DSA-87 keypair`	`447024` cycles	`448619` cycles	`1.00`
`ML-DSA-87 sign`	`1153646` cycles	`1157905` cycles	`1.00`
`ML-DSA-87 verify`	`459676` cycles	`458683` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`67619` cycles	`68090` cycles	`0.99`
`ML-DSA-44 sign`	`202698` cycles	`202380` cycles	`1.00`
`ML-DSA-44 verify`	`70891` cycles	`70623` cycles	`1.00`
`ML-DSA-65 keypair`	`119598` cycles	`121010` cycles	`0.99`
`ML-DSA-65 sign`	`330515` cycles	`332267` cycles	`0.99`
`ML-DSA-65 verify`	`117848` cycles	`117974` cycles	`1.00`
`ML-DSA-87 keypair`	`196903` cycles	`198259` cycles	`0.99`
`ML-DSA-87 sign`	`427461` cycles	`428218` cycles	`1.00`
`ML-DSA-87 verify`	`194811` cycles	`194635` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`71568` cycles	`72253` cycles	`0.99`
`ML-DSA-44 sign`	`212744` cycles	`212376` cycles	`1.00`
`ML-DSA-44 verify`	`75553` cycles	`75747` cycles	`1.00`
`ML-DSA-65 keypair`	`126328` cycles	`127630` cycles	`0.99`
`ML-DSA-65 sign`	`349346` cycles	`350882` cycles	`1.00`
`ML-DSA-65 verify`	`125556` cycles	`125712` cycles	`1.00`
`ML-DSA-87 keypair`	`205745` cycles	`208495` cycles	`0.99`
`ML-DSA-87 sign`	`444140` cycles	`450030` cycles	`0.99`
`ML-DSA-87 verify`	`205734` cycles	`205745` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a) (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`120051` cycles	`120340` cycles	`1.00`
`ML-DSA-44 sign`	`444378` cycles	`447581` cycles	`0.99`
`ML-DSA-44 verify`	`130075` cycles	`130373` cycles	`1.00`
`ML-DSA-65 keypair`	`203529` cycles	`204354` cycles	`1.00`
`ML-DSA-65 sign`	`719394` cycles	`728319` cycles	`0.99`
`ML-DSA-65 verify`	`209932` cycles	`209199` cycles	`1.00`
`ML-DSA-87 keypair`	`338921` cycles	`338993` cycles	`1.00`
`ML-DSA-87 sign`	`918581` cycles	`921541` cycles	`1.00`
`ML-DSA-87 verify`	`346483` cycles	`348601` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4 (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`128669` cycles	`128240` cycles	`1.00`
`ML-DSA-44 sign`	`445672` cycles	`447597` cycles	`1.00`
`ML-DSA-44 verify`	`136986` cycles	`144662` cycles	`0.95`
`ML-DSA-65 keypair`	`219848` cycles	`220500` cycles	`1.00`
`ML-DSA-65 sign`	`720286` cycles	`727093` cycles	`0.99`
`ML-DSA-65 verify`	`221049` cycles	`223077` cycles	`0.99`
`ML-DSA-87 keypair`	`365316` cycles	`365045` cycles	`1.00`
`ML-DSA-87 sign`	`919622` cycles	`925847` cycles	`0.99`
`ML-DSA-87 verify`	`370439` cycles	`372789` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3 (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`137571` cycles	`138463` cycles	`0.99`
`ML-DSA-44 sign`	`482669` cycles	`483929` cycles	`1.00`
`ML-DSA-44 verify`	`148479` cycles	`162291` cycles	`0.91`
`ML-DSA-65 keypair`	`240785` cycles	`241435` cycles	`1.00`
`ML-DSA-65 sign`	`784950` cycles	`792312` cycles	`0.99`
`ML-DSA-65 verify`	`240892` cycles	`241250` cycles	`1.00`
`ML-DSA-87 keypair`	`394576` cycles	`396566` cycles	`0.99`
`ML-DSA-87 sign`	`1006235` cycles	`1012538` cycles	`0.99`
`ML-DSA-87 verify`	`403026` cycles	`402623` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`112662` cycles	`113410` cycles	`0.99`
`ML-DSA-44 sign`	`356702` cycles	`355818` cycles	`1.00`
`ML-DSA-44 verify`	`118075` cycles	`118279` cycles	`1.00`
`ML-DSA-65 keypair`	`195068` cycles	`196486` cycles	`0.99`
`ML-DSA-65 sign`	`587010` cycles	`588672` cycles	`1.00`
`ML-DSA-65 verify`	`195142` cycles	`194830` cycles	`1.00`
`ML-DSA-87 keypair`	`321107` cycles	`323043` cycles	`0.99`
`ML-DSA-87 sign`	`752936` cycles	`753644` cycles	`1.00`
`ML-DSA-87 verify`	`319982` cycles	`320341` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2 (no-opt)

Details

Benchmark suite	Current: `b56df8e`	Previous: `db65535`	Ratio
`ML-DSA-44 keypair`	`213721` cycles	`213406` cycles	`1.00`
`ML-DSA-44 sign`	`759277` cycles	`762744` cycles	`1.00`
`ML-DSA-44 verify`	`229673` cycles	`235007` cycles	`0.98`
`ML-DSA-65 keypair`	`378733` cycles	`380391` cycles	`1.00`
`ML-DSA-65 sign`	`1246651` cycles	`1253555` cycles	`0.99`
`ML-DSA-65 verify`	`372918` cycles	`371798` cycles	`1.00`
`ML-DSA-87 keypair`	`603099` cycles	`604988` cycles	`1.00`
`ML-DSA-87 sign`	`1583611` cycles	`1596422` cycles	`0.99`
`ML-DSA-87 verify`	`618046` cycles	`619153` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

… mode Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer added the DO-NOT-MERGE label Mar 27, 2026

mkannwischer added 20 commits March 27, 2026 13:47

CI: Add OpenTitan integration patch for updated alloc sizes

8007a15

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

fixup: Update REDUCE_RAM KEYPAIR_PCT allocation limits

dae2745

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

CBMC: Update proofs for s1vec/s2vec/t0vec changes

b4a11ac

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

CBMC: Extract per-poly pointwise functions for s2/t0

6f7b04f

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

fixup: Fix comments for s1vec/s2vec/t0vec

9c8b5ae

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

verify: Share buffers with non-overlapping lifetimes

a0e2d0e

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

verify: Unpack z on demand via mld_zvec

f324c39

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

keygen: Reuse t0 as accumulator in compute_t0_t1

0cab2cd

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

keygen: Share t1 and s1hat buffers in compute_t0_t1

3ec1d83

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Remove dead code from packing and polyvec

cd13619

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

CI: Update OpenTitan alloc patch

dc395b7

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

Temporarily remove contracts from new packing functions

e82ec3a

To silence linting errors. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

fixup: Fix sign-conversion warning in mld_yvec_get_poly

f588546

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer force-pushed the lowram branch from 8e960ff to f588546 Compare March 27, 2026 05:47

Guard functions not used in REDUCE_RAM mode

b4dc124

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer force-pushed the lowram branch 3 times, most recently from c06caac to b56df8e Compare March 28, 2026 03:26

mkannwischer added the benchmark label Mar 28, 2026

oqs-bot reviewed Mar 28, 2026

View reviewed changes

github-actions bot reviewed Mar 28, 2026

View reviewed changes

oqs-bot reviewed Mar 28, 2026

View reviewed changes

github-actions bot reviewed Mar 28, 2026

View reviewed changes

oqs-bot reviewed Mar 28, 2026

View reviewed changes

fixup: Guard unit test/bench for polyvecl_pointwise_acc in REDUCE_RAM…

58b6aa1

… mode Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer force-pushed the lowram branch from b56df8e to 58b6aa1 Compare March 28, 2026 03:48

Conversation

mkannwischer commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oqs-bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CBMC Results (ML-DSA-87)

Uh oh!

oqs-bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CBMC Results (ML-DSA-44)

Uh oh!

oqs-bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CBMC Results (ML-DSA-65)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Intel Xeon 4th gen (c7i)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Intel Xeon 4th gen (c7i) (no-opt)

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a) (no-opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i) (no-opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Graviton4

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Graviton3

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a) (no-opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Graviton4 (no-opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Graviton3 (no-opt)

Uh oh!

oqs-bot left a comment

Choose a reason for hiding this comment

Graviton2

Uh oh!

mkannwischer commented Mar 27, 2026 •

edited

Loading

oqs-bot commented Mar 27, 2026 •

edited

Loading

oqs-bot commented Mar 27, 2026 •

edited

Loading

oqs-bot commented Mar 27, 2026 •

edited

Loading