fix the memory issue by floatingCatty · Pull Request #322 · deepmodeling/DeePTB

floatingCatty · 2026-03-25T17:37:49Z

Summary by CodeRabbit

Refactor
- Optimized memory usage during eigenvalue computations through improved resource reuse and cleanup efficiency.
Bug Fixes
- Enhanced error handling and resource cleanup to prevent memory leaks during failed computations.

coderabbitai · 2026-03-25T17:47:02Z

📝 Walkthrough

Walkthrough

The PardisoEig.solve method was refactored to reuse a single PARDISO solver instance across all k-points instead of creating one per k-point. Memory management was improved by adding explicit deallocation calls after each k-point and wrapping the computation in try/finally to ensure cleanup even on failure.

Changes

Cohort / File(s)	Summary
PARDISO Solver Memory Optimization `dptb/nn/energy.py`	Refactored `PardisoEig.solve` to reuse a single `PyPardisoSolver(mtype=13)` instance across all k-points. Added try/finally for safe cleanup, explicit `solver.free_memory()` calls after each k-point's `eigsh` computation, and `del Op` to release the linear operator after factorization memory is freed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'fix the memory issue' is vague and generic, failing to specify which memory issue or which component is affected.	Use a more specific title like 'Optimize PardisoEig.solve memory management by reusing solver instance and freeing buffers' to clearly describe the actual changes made.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dptb/nn/energy.py`:
- Around line 197-205: The try/except should only catch ARPACK convergence
failures and must cap ncv to the matrix dimension N so we never call eigsh with
ncv >= N; update the import to include ArpackNoConvergence and change the except
to except ArpackNoConvergence: (so other validation errors propagate), and
before calling eigsh compute N = hk.shape[0] (or use M.shape[0]) and set ncv =
min(ncv, N-1) (and similarly for the retry ncv) so ncv satisfies k < ncv < N
when calling eigsh with hk, M, Op, self.neig, sigma=0.0, mode=self.mode,
which="LM".
- Around line 200-205: The eigsh calls in energy.py always allocate eigenvectors
because they unpack vals, vecs regardless of whether callers requested
eigenvectors; modify both eigsh invocations (the calls with A=hk, M=M,
k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=...) to pass
return_eigenvectors=return_eigenvectors, and change the unpacking so you only
unpack vecs when return_eigenvectors is True (e.g., if return_eigenvectors:
vals, vecs = eigsh(... ) else: vals = eigsh(... )); ensure this change is
applied to both the initial call and the retry call that increases ncv and keep
references to hk, M, Op, and self.neig intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 923e9ac3-4d71-44c4-b9f0-380b9bb2f5c8

📥 Commits

Reviewing files that changed from the base of the PR and between f00da65 and 7f12a1b.

📒 Files selected for processing (1)

dptb/nn/energy.py

coderabbitai · 2026-03-25T17:47:05Z

dptb/nn/energy.py

+                try:
+                    # Use larger NCV to help convergence, especially for clustered eigenvalues
+                    ncv =  max(2*self.neig + 1, 20)
+                    vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+                except Exception:
+                    # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
+                    # This often happens when eigenvalues are clustered near the shift
+                    ncv =  max(5*self.neig, 50)
+                    vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd dptb && find . -name "energy.py" -type f | head -5

Repository: deepmodeling/DeePTB

Length of output: 77

🏁 Script executed:

cat -n dptb/nn/energy.py | head -50

Repository: deepmodeling/DeePTB

Length of output: 2261

🏁 Script executed:

cat -n dptb/nn/energy.py | sed -n '150,250p'

Repository: deepmodeling/DeePTB

Length of output: 4994

🏁 Script executed:

rg "import.*eigsh|from.*eigsh" dptb/nn/energy.py -A 2 -B 2

Repository: deepmodeling/DeePTB

Length of output: 279

🏁 Script executed:

web_search scipy.sparse.linalg.eigsh ncv parameter constraints requirements

Repository: deepmodeling/DeePTB

Length of output: 177

🌐 Web query:

scipy.sparse.linalg.eigsh ncv parameter documentation constraints

💡 Result:

The ncv parameter in scipy.sparse.linalg.eigsh specifies the number of Lanczos vectors generated. Its constraints are: ncv must be greater than k (number of desired eigenvalues) and smaller than n (matrix dimension). It is recommended that ncv > 2k. The default value is min(n, max(2k + 1, 20)).

Citations:

1: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html

2: https://docs.scipy.org/doc/scipy-1.5.3/reference/generated/scipy.sparse.linalg.eigsh.html

3: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/generated/scipy.sparse.linalg.eigsh.html

4: https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.sparse.linalg.eigsh.html

🌐 Web query:

scipy.sparse.linalg ArpackNoConvergence exception

💡 Result:

The scipy.sparse.linalg.ArpackNoConvergence exception is raised by ARPACK-based functions like eigsh and eigs when the iterative Arnoldi/Lanczos process fails to converge to the requested eigenvalues within the specified maximum iterations (default maxiter=n10) and tolerance (default tol=0, machine precision). Key details from official SciPy documentation (v1.17.0): - Exception class: scipy.sparse.linalg.ArpackNoConvergence(msg, eigenvalues, eigenvectors) - Attributes: eigenvalues (ndarray of converged eigenvalues), eigenvectors (ndarray of converged eigenvectors) - use these partial results if available. - Raised explicitly when convergence criteria are not met. Common causes: - Targeting non-extremal eigenvalues (e.g., which='SM' for small magnitude) - ARPACK excels at extreme eigenvalues (large magnitude). - Insufficient maxiter, tight tol, or small ncv (number of Lanczos vectors; default min(n, max(2k+1,20)); recommend ncv > 2k). - Clustered eigenvalues or ill-conditioned matrices. Handling strategies (from SciPy ARPACK tutorial and eigsh docs): 1. Increase maxiter (e.g., maxiter=5000). 2. Relax tolerance (e.g., tol=1e-2), but at precision cost. 3. Increase ncv (e.g., ncv=2k + extra). 4. Use shift-invert mode: Set sigma near target eigenvalues (e.g., sigma=0 for near-zero), which='LM' to transform interior eigenvalues to extreme ones. Requires solving linear systems internally. 5. Provide good initial vector v0. 6. Catch exception and use partial eigenvalues/eigenvectors: try: vals, vecs = eigsh(A, k=6) except scipy.sparse.linalg.ArpackNoConvergence as err: print("Converged:", len(err.eigenvalues), "of 6") vals, vecs = err.eigenvalues, err.eigenvectors 7. Alternatives: lobpcg for symmetric problems, or dense solvers for small matrices. Example from tutorial (fails with which='SM', succeeds with shift-invert): from scipy.sparse.linalg import eigsh evals_small, evecs_small = eigsh(X, 3, which='SM') # May raise ArpackNoConvergence evals_small, evecs_small = eigsh(X, 3, sigma=0, which='LM') # Succeeds faster Distinguish from ArpackError (other ARPACK issues like invalid params).

Citations:

1: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.ArpackNoConvergence.html

2: https://docs.scipy.org/doc/scipy/tutorial/arpack.html

3: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html

4: https://pythonhosted.org/landlab/_modules/scipy/sparse/linalg/eigen/arpack/arpack.html

5: https://stackoverflow.com/questions/29836071/avoiding-scipy-eigs-arpacknoconvergence-error

6: random ArpackNoConvergence errors scipy/scipy#4765

7: scipy.sparse.linalg.eigsh occasionally fails on the identity matrix with "ARPACK error 3" scipy/scipy#14089

8: https://scipy.github.io/devdocs/tutorial/arpack.html

Cap ncv by N and only retry ARPACK non-convergence.

Lines 199 and 204 can set ncv above N on small systems. According to scipy.sparse.linalg.eigsh documentation, ncv must satisfy k < ncv < N (where N is the matrix dimension). Because line 201 catches all Exception types, scipy's validation error (when ncv > N) gets masked and retried with an even larger ncv instead of surfacing immediately. Only ArpackNoConvergence (the actual convergence failure) should be caught for retry.

🐛 Proposed fix

+ if not 0 < self.neig < N: + raise ValueError(f"`neig` must satisfy 0 < neig < N, got neig={self.neig}, N={N}") + try: # Use larger NCV to help convergence, especially for clustered eigenvalues - ncv = max(2*self.neig + 1, 20) + ncv = min(N, max(2 * self.neig + 1, 20)) vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) - except Exception: + except ArpackNoConvergence: # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied) # This often happens when eigenvalues are clustered near the shift - ncv = max(5*self.neig, 50) + ncv = min(N, max(5 * self.neig, 50)) vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)

Update the import on line 17 to include ArpackNoConvergence:

from scipy.sparse.linalg import eigsh, LinearOperator, ArpackNoConvergence

🧰 Tools

🪛 Ruff (0.15.6)

[warning] 201-201: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dptb/nn/energy.py` around lines 197 - 205, The try/except should only catch ARPACK convergence failures and must cap ncv to the matrix dimension N so we never call eigsh with ncv >= N; update the import to include ArpackNoConvergence and change the except to except ArpackNoConvergence: (so other validation errors propagate), and before calling eigsh compute N = hk.shape[0] (or use M.shape[0]) and set ncv = min(ncv, N-1) (and similarly for the retry ncv) so ncv satisfies k < ncv < N when calling eigsh with hk, M, Op, self.neig, sigma=0.0, mode=self.mode, which="LM".

coderabbitai · 2026-03-25T17:47:05Z

dptb/nn/energy.py

+                    vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+                except Exception:
+                    # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
+                    # This often happens when eigenvalues are clustered near the shift
+                    ncv =  max(5*self.neig, 50)
+                    vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "energy.py" -path "*/nn/*" | head -5

Repository: deepmodeling/DeePTB

Length of output: 82

🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '190,225p'

Repository: deepmodeling/DeePTB

Length of output: 1994

🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '150,230p'

Repository: deepmodeling/DeePTB

Length of output: 4139

🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '1,50p'

Repository: deepmodeling/DeePTB

Length of output: 2261

🏁 Script executed:

python3 << 'EOF' from scipy.sparse.linalg import eigsh import inspect # Get the signature of eigsh sig = inspect.signature(eigsh) print("eigsh signature:") print(sig) print("\nParameters:") for param_name, param in sig.parameters.items(): print(f" {param_name}: {param.default if param.default != inspect.Parameter.empty else 'required'}") EOF

Repository: deepmodeling/DeePTB

Length of output: 180

🌐 Web query:

scipy sparse linalg eigsh return_eigenvectors parameter

💡 Result:

The return_eigenvectors parameter in scipy.sparse.linalg.eigsh is a boolean option that determines whether to return eigenvectors in addition to eigenvalues. It defaults to True. When True (default), the function returns a tuple (w, v) where w is an array of k eigenvalues and v is an array of k corresponding eigenvectors (v[:, i] is the eigenvector for eigenvalue w[i]). When False, it returns only the array of k eigenvalues w. This parameter also affects eigenvalue sorting: for which='LM' or 'SA', eigenvalues are sorted by algebraic value if return_eigenvectors=True, or by absolute value if False. For other which options, sorting is by algebraic value regardless.

Citations:

1: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html

2: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/generated/scipy.sparse.linalg.eigsh.html

3: https://docs.scipy.org/doc/scipy-1.7.1/reference/reference/generated/scipy.sparse.linalg.eigsh.html

🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '120,165p'

Repository: deepmodeling/DeePTB

Length of output: 2031

Pass return_eigenvectors parameter to both eigsh calls to avoid unnecessary eigenvector allocation.

Lines 200 and 205 always retrieve and unpack both eigenvalues and eigenvectors from eigsh, even when callers only request eigenvalues. Since scipy.sparse.linalg.eigsh defaults to return_eigenvectors=True, the N × neig eigenvector matrix is always allocated. This allocation is only conditionally used at line 214, defeating the purpose of the return_eigenvectors parameter.

Pass return_eigenvectors=return_eigenvectors to both eigsh calls and conditionally unpack the result to avoid the unnecessary allocation when eigenvectors aren't needed.

Proposed fix

- vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) + result = eigsh( + A=hk, + M=M, + k=self.neig, + sigma=0.0, + OPinv=Op, + mode=self.mode, + which="LM", + ncv=ncv, + return_eigenvectors=return_eigenvectors, + )

Apply the same change to line 205.

Then update the unpacking logic:

- eigvals_list.append(vals + self.sigma) - if return_eigenvectors: - eigvecs_list.append(vecs) + if return_eigenvectors: + vals, vecs = result + eigvecs_list.append(vecs) + else: + vals = result + eigvals_list.append(vals + self.sigma)

🧰 Tools

🪛 Ruff (0.15.6)

[warning] 201-201: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dptb/nn/energy.py` around lines 200 - 205, The eigsh calls in energy.py always allocate eigenvectors because they unpack vals, vecs regardless of whether callers requested eigenvectors; modify both eigsh invocations (the calls with A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=...) to pass return_eigenvectors=return_eigenvectors, and change the unpacking so you only unpack vecs when return_eigenvectors is True (e.g., if return_eigenvectors: vals, vecs = eigsh(... ) else: vals = eigsh(... )); ensure this change is applied to both the initial call and the retry call that increases ncv and keep references to hk, M, Op, and self.neig intact.

fix the memory issue

7f12a1b

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the memory issue#322

fix the memory issue#322
floatingCatty wants to merge 1 commit intodeepmodeling:mainfrom
floatingCatty:fix-memory-issue-of-Pardiso-solver

floatingCatty commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 25, 2026

Uh oh!

coderabbitai bot Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

floatingCatty commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

floatingCatty commented Mar 25, 2026 •

edited by coderabbitai bot

Loading