-
Notifications
You must be signed in to change notification settings - Fork 4
Enable libcint 4-center 1-electron overlap integrals (WITH_4C1E) #82
Description
Context
For DFT inversion work, we need 4-center 1-electron overlap integrals:
∫ φ_i(r) φ_j(r) φ_k(r) φ_l(r) dr
These are distinct from the 2-electron ERIs (ij|1/r₁₂|kl) that SlaterGPU already wraps via gen_eri. Same 4-index structure, but a 1-electron integral with no Coulomb operator.
libcint provides this via cint4c1e_sph / cint4c1e_cart (available since libcint v2.8.16, implemented in src/cint4c1e.c and src/g4c1e.c), but these functions are not compiled by default. Paul has a working prototype in his personal (unversioned) code — including both a libcint cache bug fix and a SlaterGPU-style wrapper — that produces reasonable DFT inversion results.
Priority 1: Enable -DWITH_4C1E=1 in the build
The CMake flag -DWITH_4C1E=1 must be passed when building libcint for these functions to be available. Update SlaterGPU's pixi.toml and/or CMake configuration so that libcint is built with this flag. This lets us maintain a unified build rather than requiring a separate manual libcint compilation.
Future work: Integrate Paul's wrapper
Paul already has a working cint4c1e wrapper in his local code that follows the gen_eri pattern in cintwrapper.cpp. This needs to be brought into the repo. The wrapper:
- Loops over all 4 shell quartets
- Calls
cint4c1e_cartorcint4c1e_sphdepending onBT::DO_CART - Unpacks buffer with standard column-major indexing
- Full permutational symmetry over all 4 indices
Future work: Known upstream libcint bug (cache buffer sizing)
libcint's own CMakeLists.txt contains the warning: "Note there are bugs in 4c1e integral functions".
The likely bug is a cache buffer sizing issue in src/cint4c1e.c and src/g4c1e.c:
CINTg4c1e_ovlp(ing4c1e.c) needs3 * b_sizedoubles of scratch space from thecachepointer, whereb_size = db * (MAX(nmax, mmax) + 1)anddb = nmax + mmax + 1(withnmax = li + lj,mmax = lk + ll).- The cache size calculations in
CINT4c1e_drv(both the size-query path whenout == NULLat ~line 270, and the self-allocation path whencache == NULLat ~line 278) do not include this3 * b_sizeterm. CINT4c1e_loop_nopt(~line 108) has the same omission in itslencalculation.- For low angular momentum shells, there's enough slack in the
lengallocation that it works by accident. For higher angular momentum (roughlyli+lj+lk+ll >= 6),bufx/bufy/bufzcan write past the end of the cache — a heap buffer overflow.
Proposed fix (applied by Paul in his prototype, not yet validated with tests): add ovlp_scratch = 3 * db * (MAX(nmax, mmax) + 1) to the len/cache_size calculations in all 4 affected locations in cint4c1e.c. The values nmax, mmax are available from envs->li_ceil + envs->lj_ceil and envs->lk_ceil + envs->ll_ceil.
This fix has not been tested — libcint has a test (testsuite/test_cint4c1e.py) that validates cint4c1e_sph against a Laplacian-based identity using cint2e_ipip1_sph and cint2e_ipvip1_sph. That should be run to confirm correctness. We should also consider filing the fix upstream on sunqm/libcint.
References
- libcint 4c1e source:
src/cint4c1e.c,src/g4c1e.cin sunqm/libcint - libcint test:
testsuite/test_cint4c1e.py - SlaterGPU ERI wrapper (pattern to follow):
src/libcintw/cintwrapper.cpp,gen_erifunction - libcint build flag:
CMakeLists.txtoptionWITH_4C1E