-
Notifications
You must be signed in to change notification settings - Fork 53
ConvergenceWarning probelm in findtailthreshold.py #187
Description
When fitting the model, I encountered the following warning:
~/.conda/envs/taoenv-glmsingle/lib/python3.10/site-packages/sklearn/mixture/_base.py:275: ConvergenceWarning: Best performing initialization did not converge. Try different init parameters, or increase max_iter, tol, or check for degenerate data.
Upon investigation, I found that the issue originates in the file findtailthreshold.py, specifically at Line 55:
# fit mixture of two gaussians
gmfit = gmdist(n_components=2, tol=1e-10, reg_covar=0, n_init=numreps).fit(v2.reshape(-1, 1))
After reviewing the use of sklearn.mixture.GaussianMixture, I noted that the default values for the parameters are: tol = 1e-3, reg_covar = 1e-6, max_iter = 100. The warning indicates that the model is not converging during initialization, which suggests the chosen parameter values might need adjustment.
i). I understand that the goal of setting tol to such a small value is likely to improve precision in convergence.
ii). By default, reg_covar is set to 1e-6, which applies a small amount of regularization to the covariance matrix to ensure stability and prevent issues such as singular covariance matrices. Setting reg_covar = 0 disables this regularization, which could lead to convergence issues, particularly if the data is degenerate or lacks variability. Can you clarify why reg_covar was set to 0 in this case?
iii). Given that the model is not converging, I was considering increasing max_iter to 200, which may provide more iterations for the algorithm to converge, especially if the initial parameters are far from the optimal solution. I tried setting reg_covar by default and max_iter as 200, and it did solve the problem! But I'm worried about the overfitting problem.
I’d appreciate your feedback or any recommendations for improving the parameter settings to ensure better convergence!