Skip to content

fix the memory issue#322

Open
floatingCatty wants to merge 1 commit intodeepmodeling:mainfrom
floatingCatty:fix-memory-issue-of-Pardiso-solver
Open

fix the memory issue#322
floatingCatty wants to merge 1 commit intodeepmodeling:mainfrom
floatingCatty:fix-memory-issue-of-Pardiso-solver

Conversation

@floatingCatty
Copy link
Copy Markdown
Member

@floatingCatty floatingCatty commented Mar 25, 2026

Summary by CodeRabbit

  • Refactor

    • Optimized memory usage during eigenvalue computations through improved resource reuse and cleanup efficiency.
  • Bug Fixes

    • Enhanced error handling and resource cleanup to prevent memory leaks during failed computations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

The PardisoEig.solve method was refactored to reuse a single PARDISO solver instance across all k-points instead of creating one per k-point. Memory management was improved by adding explicit deallocation calls after each k-point and wrapping the computation in try/finally to ensure cleanup even on failure.

Changes

Cohort / File(s) Summary
PARDISO Solver Memory Optimization
dptb/nn/energy.py
Refactored PardisoEig.solve to reuse a single PyPardisoSolver(mtype=13) instance across all k-points. Added try/finally for safe cleanup, explicit solver.free_memory() calls after each k-point's eigsh computation, and del Op to release the linear operator after factorization memory is freed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'fix the memory issue' is vague and generic, failing to specify which memory issue or which component is affected. Use a more specific title like 'Optimize PardisoEig.solve memory management by reusing solver instance and freeing buffers' to clearly describe the actual changes made.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dptb/nn/energy.py`:
- Around line 197-205: The try/except should only catch ARPACK convergence
failures and must cap ncv to the matrix dimension N so we never call eigsh with
ncv >= N; update the import to include ArpackNoConvergence and change the except
to except ArpackNoConvergence: (so other validation errors propagate), and
before calling eigsh compute N = hk.shape[0] (or use M.shape[0]) and set ncv =
min(ncv, N-1) (and similarly for the retry ncv) so ncv satisfies k < ncv < N
when calling eigsh with hk, M, Op, self.neig, sigma=0.0, mode=self.mode,
which="LM".
- Around line 200-205: The eigsh calls in energy.py always allocate eigenvectors
because they unpack vals, vecs regardless of whether callers requested
eigenvectors; modify both eigsh invocations (the calls with A=hk, M=M,
k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=...) to pass
return_eigenvectors=return_eigenvectors, and change the unpacking so you only
unpack vecs when return_eigenvectors is True (e.g., if return_eigenvectors:
vals, vecs = eigsh(... ) else: vals = eigsh(... )); ensure this change is
applied to both the initial call and the retry call that increases ncv and keep
references to hk, M, Op, and self.neig intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 923e9ac3-4d71-44c4-b9f0-380b9bb2f5c8

📥 Commits

Reviewing files that changed from the base of the PR and between f00da65 and 7f12a1b.

📒 Files selected for processing (1)
  • dptb/nn/energy.py

Comment on lines +197 to +205
try:
# Use larger NCV to help convergence, especially for clustered eigenvalues
ncv = max(2*self.neig + 1, 20)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
except Exception:
# Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
# This often happens when eigenvalues are clustered near the shift
ncv = max(5*self.neig, 50)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd dptb && find . -name "energy.py" -type f | head -5

Repository: deepmodeling/DeePTB

Length of output: 77


🏁 Script executed:

cat -n dptb/nn/energy.py | head -50

Repository: deepmodeling/DeePTB

Length of output: 2261


🏁 Script executed:

cat -n dptb/nn/energy.py | sed -n '150,250p'

Repository: deepmodeling/DeePTB

Length of output: 4994


🏁 Script executed:

rg "import.*eigsh|from.*eigsh" dptb/nn/energy.py -A 2 -B 2

Repository: deepmodeling/DeePTB

Length of output: 279


🏁 Script executed:

web_search
scipy.sparse.linalg.eigsh ncv parameter constraints requirements

Repository: deepmodeling/DeePTB

Length of output: 177


🌐 Web query:

scipy.sparse.linalg.eigsh ncv parameter documentation constraints

💡 Result:

The ncv parameter in scipy.sparse.linalg.eigsh specifies the number of Lanczos vectors generated. Its constraints are: ncv must be greater than k (number of desired eigenvalues) and smaller than n (matrix dimension). It is recommended that ncv > 2k. The default value is min(n, max(2k + 1, 20)).

Citations:


🌐 Web query:

scipy.sparse.linalg ArpackNoConvergence exception

💡 Result:

The scipy.sparse.linalg.ArpackNoConvergence exception is raised by ARPACK-based functions like eigsh and eigs when the iterative Arnoldi/Lanczos process fails to converge to the requested eigenvalues within the specified maximum iterations (default maxiter=n10) and tolerance (default tol=0, machine precision). Key details from official SciPy documentation (v1.17.0): - Exception class: scipy.sparse.linalg.ArpackNoConvergence(msg, eigenvalues, eigenvectors) - Attributes: eigenvalues (ndarray of converged eigenvalues), eigenvectors (ndarray of converged eigenvectors) - use these partial results if available. - Raised explicitly when convergence criteria are not met. Common causes: - Targeting non-extremal eigenvalues (e.g., which='SM' for small magnitude) - ARPACK excels at extreme eigenvalues (large magnitude). - Insufficient maxiter, tight tol, or small ncv (number of Lanczos vectors; default min(n, max(2k+1,20)); recommend ncv > 2k). - Clustered eigenvalues or ill-conditioned matrices. Handling strategies (from SciPy ARPACK tutorial and eigsh docs): 1. Increase maxiter (e.g., maxiter=5000). 2. Relax tolerance (e.g., tol=1e-2), but at precision cost. 3. Increase ncv (e.g., ncv=2k + extra). 4. Use shift-invert mode: Set sigma near target eigenvalues (e.g., sigma=0 for near-zero), which='LM' to transform interior eigenvalues to extreme ones. Requires solving linear systems internally. 5. Provide good initial vector v0. 6. Catch exception and use partial eigenvalues/eigenvectors: try: vals, vecs = eigsh(A, k=6) except scipy.sparse.linalg.ArpackNoConvergence as err: print("Converged:", len(err.eigenvalues), "of 6") vals, vecs = err.eigenvalues, err.eigenvectors 7. Alternatives: lobpcg for symmetric problems, or dense solvers for small matrices. Example from tutorial (fails with which='SM', succeeds with shift-invert): from scipy.sparse.linalg import eigsh evals_small, evecs_small = eigsh(X, 3, which='SM') # May raise ArpackNoConvergence evals_small, evecs_small = eigsh(X, 3, sigma=0, which='LM') # Succeeds faster Distinguish from ArpackError (other ARPACK issues like invalid params).

Citations:


Cap ncv by N and only retry ARPACK non-convergence.

Lines 199 and 204 can set ncv above N on small systems. According to scipy.sparse.linalg.eigsh documentation, ncv must satisfy k < ncv < N (where N is the matrix dimension). Because line 201 catches all Exception types, scipy's validation error (when ncv > N) gets masked and retried with an even larger ncv instead of surfacing immediately. Only ArpackNoConvergence (the actual convergence failure) should be caught for retry.

🐛 Proposed fix
+                if not 0 < self.neig < N:
+                    raise ValueError(f"`neig` must satisfy 0 < neig < N, got neig={self.neig}, N={N}")
+
                 try:
                     # Use larger NCV to help convergence, especially for clustered eigenvalues
-                    ncv =  max(2*self.neig + 1, 20)
+                    ncv = min(N, max(2 * self.neig + 1, 20))
                     vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
-                except Exception:
+                except ArpackNoConvergence:
                     # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
                     # This often happens when eigenvalues are clustered near the shift
-                    ncv =  max(5*self.neig, 50)
+                    ncv = min(N, max(5 * self.neig, 50))
                     vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)

Update the import on line 17 to include ArpackNoConvergence:

from scipy.sparse.linalg import eigsh, LinearOperator, ArpackNoConvergence
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 201-201: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dptb/nn/energy.py` around lines 197 - 205, The try/except should only catch
ARPACK convergence failures and must cap ncv to the matrix dimension N so we
never call eigsh with ncv >= N; update the import to include ArpackNoConvergence
and change the except to except ArpackNoConvergence: (so other validation errors
propagate), and before calling eigsh compute N = hk.shape[0] (or use M.shape[0])
and set ncv = min(ncv, N-1) (and similarly for the retry ncv) so ncv satisfies k
< ncv < N when calling eigsh with hk, M, Op, self.neig, sigma=0.0,
mode=self.mode, which="LM".

Comment on lines +200 to +205
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
except Exception:
# Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
# This often happens when eigenvalues are clustered near the shift
ncv = max(5*self.neig, 50)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "energy.py" -path "*/nn/*" | head -5

Repository: deepmodeling/DeePTB

Length of output: 82


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '190,225p'

Repository: deepmodeling/DeePTB

Length of output: 1994


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '150,230p'

Repository: deepmodeling/DeePTB

Length of output: 4139


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '1,50p'

Repository: deepmodeling/DeePTB

Length of output: 2261


🏁 Script executed:

python3 << 'EOF'
from scipy.sparse.linalg import eigsh
import inspect

# Get the signature of eigsh
sig = inspect.signature(eigsh)
print("eigsh signature:")
print(sig)
print("\nParameters:")
for param_name, param in sig.parameters.items():
    print(f"  {param_name}: {param.default if param.default != inspect.Parameter.empty else 'required'}")
EOF

Repository: deepmodeling/DeePTB

Length of output: 180


🌐 Web query:

scipy sparse linalg eigsh return_eigenvectors parameter

💡 Result:

The return_eigenvectors parameter in scipy.sparse.linalg.eigsh is a boolean option that determines whether to return eigenvectors in addition to eigenvalues. It defaults to True. When True (default), the function returns a tuple (w, v) where w is an array of k eigenvalues and v is an array of k corresponding eigenvectors (v[:, i] is the eigenvector for eigenvalue w[i]). When False, it returns only the array of k eigenvalues w. This parameter also affects eigenvalue sorting: for which='LM' or 'SA', eigenvalues are sorted by algebraic value if return_eigenvectors=True, or by absolute value if False. For other which options, sorting is by algebraic value regardless.

Citations:


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '120,165p'

Repository: deepmodeling/DeePTB

Length of output: 2031


Pass return_eigenvectors parameter to both eigsh calls to avoid unnecessary eigenvector allocation.

Lines 200 and 205 always retrieve and unpack both eigenvalues and eigenvectors from eigsh, even when callers only request eigenvalues. Since scipy.sparse.linalg.eigsh defaults to return_eigenvectors=True, the N × neig eigenvector matrix is always allocated. This allocation is only conditionally used at line 214, defeating the purpose of the return_eigenvectors parameter.

Pass return_eigenvectors=return_eigenvectors to both eigsh calls and conditionally unpack the result to avoid the unnecessary allocation when eigenvectors aren't needed.

Proposed fix
-                    vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+                    result = eigsh(
+                        A=hk,
+                        M=M,
+                        k=self.neig,
+                        sigma=0.0,
+                        OPinv=Op,
+                        mode=self.mode,
+                        which="LM",
+                        ncv=ncv,
+                        return_eigenvectors=return_eigenvectors,
+                    )

Apply the same change to line 205.

Then update the unpacking logic:

-                eigvals_list.append(vals + self.sigma)
-                if return_eigenvectors:
-                    eigvecs_list.append(vecs)
+                if return_eigenvectors:
+                    vals, vecs = result
+                    eigvecs_list.append(vecs)
+                else:
+                    vals = result
+                eigvals_list.append(vals + self.sigma)
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 201-201: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dptb/nn/energy.py` around lines 200 - 205, The eigsh calls in energy.py
always allocate eigenvectors because they unpack vals, vecs regardless of
whether callers requested eigenvectors; modify both eigsh invocations (the calls
with A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM",
ncv=...) to pass return_eigenvectors=return_eigenvectors, and change the
unpacking so you only unpack vecs when return_eigenvectors is True (e.g., if
return_eigenvectors: vals, vecs = eigsh(... ) else: vals = eigsh(... )); ensure
this change is applied to both the initial call and the retry call that
increases ncv and keep references to hk, M, Op, and self.neig intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant