Disorder builder#1410
Conversation
| n: int = Field(..., description="Number of structures in this group.") | ||
| mae_per_site: float = Field(..., description="Mean absolute error per site.") | ||
| rmse_per_site: float = Field(..., description="Root-mean-square error per site.") | ||
| max_abs_per_site: float = Field(..., description="Maximum absolute error per site.") |
There was a problem hiding this comment.
More nitpicky but you can skip the ellipses here and below
| in_sample: CEFitMetrics = Field(...) | ||
| five_fold_cv: CEFitMetrics = Field(...) |
There was a problem hiding this comment.
Same as here, for readability this is identical:
in_sample: CEFitMetrics
five_fold_cv: CEFitMetrics| standardization: Literal["none", "column_zscore"] = Field( | ||
| ..., description="Column standardization mode applied before SVD." | ||
| ) |
There was a problem hiding this comment.
Are there more column standardization methods you plan to apply, or does it make sense to change this to something like:
standardization: bool = Field(description = "True if column zscore standardization was applied before SVD.")| def build_disorder_doc( | ||
| disordered_documents: list[DisorderedTaskDoc], | ||
| ordered_task_doc: CoreTaskDoc, | ||
| *, |
There was a problem hiding this comment.
Let's remove the kwarg delimiters (*)
| for doc in disordered_documents[1:]: | ||
| if doc.ordered_task_id != first.ordered_task_id: | ||
| raise ValueError("Ordered task IDs do not match across documents.") | ||
| if doc.supercell_diag != first.supercell_diag: | ||
| raise ValueError("Supercell diagonals do not match across documents.") | ||
| if doc.prototype != first.prototype: | ||
| raise ValueError("Prototypes do not match across documents.") | ||
| if doc.prototype_params != first.prototype_params: | ||
| raise ValueError("Prototype parameters do not match across documents.") | ||
| if doc.versions != first.versions: | ||
| raise ValueError("Versions do not match across documents.") |
There was a problem hiding this comment.
Not sure how many disordered_documents go into building the final doc, but you may just want to replace these with list comprehensions to get a speedup from C
for attr, exc in {
"ordered_task_id": "Ordered task IDs do not match across documents.",
"supercell_diag": "Supercell diagonals do not match across documents.",
... # add the rest
}.items():
if any(getattr(doc,attr) != getattr(first,attr) for doc in disordered_documents[1:]):
raise ValueError(exc)| ) | ||
|
|
||
| num_bins = len(wl_block["state"].bin_indices) | ||
| while num_bins < min_bins or num_bins > max_bins: |
There was a problem hiding this comment.
Maybe set a maximum number of refinements here, unless there's a guarantee the while won't hang indefinitely?
| num_bins = len(wl_block["state"].bin_indices) | ||
|
|
||
| # --- WL convergence loop --- | ||
| while wl_block["state"].mod_factor > wl_convergence_threshold: |
There was a problem hiding this comment.
Same as above, let's impose a maximum number of recursions for the while
| ) | ||
| new_entropy = float(self._entropy_d.get(int(new_bin_id), 0.0)) | ||
|
|
||
| assert self.mcusher is not None, "MCUsher is not initialized" |
There was a problem hiding this comment.
Linting will eventually complain about this assert, let's change to raise ValueError or a more specific DisorderedBuilderError
| from ase.spacegroup import crystal | ||
|
|
||
|
|
||
| class PrototypeStructure(str, Enum): |
There was a problem hiding this comment.
Just use StrEnum? Our base is py3.11
| "smol", | ||
| "ase", | ||
| "scikit-learn", |
There was a problem hiding this comment.
Can you move these to a separate dependency group like disorder?
|
Thanks! Skimmed through mostly looking for structural stuff, will take a deeper look later Maybe a major question for you: Do you see a benefit to moving some of the Wang-Landau code to |
This is for the disordered materials builder