Skip to content

Enable libcint 4-center 1-electron overlap integrals (WITH_4C1E) #82

@joshkamm

Description

@joshkamm

Context

For DFT inversion work, we need 4-center 1-electron overlap integrals:

∫ φ_i(r) φ_j(r) φ_k(r) φ_l(r) dr

These are distinct from the 2-electron ERIs (ij|1/r₁₂|kl) that SlaterGPU already wraps via gen_eri. Same 4-index structure, but a 1-electron integral with no Coulomb operator.

libcint provides this via cint4c1e_sph / cint4c1e_cart (available since libcint v2.8.16, implemented in src/cint4c1e.c and src/g4c1e.c), but these functions are not compiled by default. Paul has a working prototype in his personal (unversioned) code — including both a libcint cache bug fix and a SlaterGPU-style wrapper — that produces reasonable DFT inversion results.

Priority 1: Enable -DWITH_4C1E=1 in the build

The CMake flag -DWITH_4C1E=1 must be passed when building libcint for these functions to be available. Update SlaterGPU's pixi.toml and/or CMake configuration so that libcint is built with this flag. This lets us maintain a unified build rather than requiring a separate manual libcint compilation.

Future work: Integrate Paul's wrapper

Paul already has a working cint4c1e wrapper in his local code that follows the gen_eri pattern in cintwrapper.cpp. This needs to be brought into the repo. The wrapper:

  • Loops over all 4 shell quartets
  • Calls cint4c1e_cart or cint4c1e_sph depending on BT::DO_CART
  • Unpacks buffer with standard column-major indexing
  • Full permutational symmetry over all 4 indices

Future work: Known upstream libcint bug (cache buffer sizing)

libcint's own CMakeLists.txt contains the warning: "Note there are bugs in 4c1e integral functions".

The likely bug is a cache buffer sizing issue in src/cint4c1e.c and src/g4c1e.c:

  • CINTg4c1e_ovlp (in g4c1e.c) needs 3 * b_size doubles of scratch space from the cache pointer, where b_size = db * (MAX(nmax, mmax) + 1) and db = nmax + mmax + 1 (with nmax = li + lj, mmax = lk + ll).
  • The cache size calculations in CINT4c1e_drv (both the size-query path when out == NULL at ~line 270, and the self-allocation path when cache == NULL at ~line 278) do not include this 3 * b_size term.
  • CINT4c1e_loop_nopt (~line 108) has the same omission in its len calculation.
  • For low angular momentum shells, there's enough slack in the leng allocation that it works by accident. For higher angular momentum (roughly li+lj+lk+ll >= 6), bufx/bufy/bufz can write past the end of the cache — a heap buffer overflow.

Proposed fix (applied by Paul in his prototype, not yet validated with tests): add ovlp_scratch = 3 * db * (MAX(nmax, mmax) + 1) to the len/cache_size calculations in all 4 affected locations in cint4c1e.c. The values nmax, mmax are available from envs->li_ceil + envs->lj_ceil and envs->lk_ceil + envs->ll_ceil.

This fix has not been tested — libcint has a test (testsuite/test_cint4c1e.py) that validates cint4c1e_sph against a Laplacian-based identity using cint2e_ipip1_sph and cint2e_ipvip1_sph. That should be run to confirm correctness. We should also consider filing the fix upstream on sunqm/libcint.

References

  • libcint 4c1e source: src/cint4c1e.c, src/g4c1e.c in sunqm/libcint
  • libcint test: testsuite/test_cint4c1e.py
  • SlaterGPU ERI wrapper (pattern to follow): src/libcintw/cintwrapper.cpp, gen_eri function
  • libcint build flag: CMakeLists.txt option WITH_4C1E

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions