Skip to content

Disorder builder#1410

Draft
ColinBundschu wants to merge 6 commits intomaterialsproject:new-buildersfrom
ColinBundschu:new-builders
Draft

Disorder builder#1410
ColinBundschu wants to merge 6 commits intomaterialsproject:new-buildersfrom
ColinBundschu:new-builders

Conversation

@ColinBundschu
Copy link
Copy Markdown
Contributor

This is for the disordered materials builder

@ColinBundschu ColinBundschu changed the title WIP disorder builder first pass Disorder builder Apr 3, 2026
Comment on lines +73 to +76
n: int = Field(..., description="Number of structures in this group.")
mae_per_site: float = Field(..., description="Mean absolute error per site.")
rmse_per_site: float = Field(..., description="Root-mean-square error per site.")
max_abs_per_site: float = Field(..., description="Maximum absolute error per site.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More nitpicky but you can skip the ellipses here and below

Comment on lines +82 to +83
in_sample: CEFitMetrics = Field(...)
five_fold_cv: CEFitMetrics = Field(...)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as here, for readability this is identical:

in_sample: CEFitMetrics
five_fold_cv: CEFitMetrics

Comment on lines +110 to +112
standardization: Literal["none", "column_zscore"] = Field(
..., description="Column standardization mode applied before SVD."
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there more column standardization methods you plan to apply, or does it make sense to change this to something like:

standardization: bool = Field(description = "True if column zscore standardization was applied before SVD.")

def build_disorder_doc(
disordered_documents: list[DisorderedTaskDoc],
ordered_task_doc: CoreTaskDoc,
*,
Copy link
Copy Markdown
Collaborator

@esoteric-ephemera esoteric-ephemera Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the kwarg delimiters (*)

Comment on lines +93 to +103
for doc in disordered_documents[1:]:
if doc.ordered_task_id != first.ordered_task_id:
raise ValueError("Ordered task IDs do not match across documents.")
if doc.supercell_diag != first.supercell_diag:
raise ValueError("Supercell diagonals do not match across documents.")
if doc.prototype != first.prototype:
raise ValueError("Prototypes do not match across documents.")
if doc.prototype_params != first.prototype_params:
raise ValueError("Prototype parameters do not match across documents.")
if doc.versions != first.versions:
raise ValueError("Versions do not match across documents.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how many disordered_documents go into building the final doc, but you may just want to replace these with list comprehensions to get a speedup from C

for attr, exc in {
    "ordered_task_id": "Ordered task IDs do not match across documents.",
    "supercell_diag": "Supercell diagonals do not match across documents.",
    ... # add the rest
}.items():
    if any(getattr(doc,attr) != getattr(first,attr) for doc in disordered_documents[1:]):
        raise ValueError(exc)

)

num_bins = len(wl_block["state"].bin_indices)
while num_bins < min_bins or num_bins > max_bins:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set a maximum number of refinements here, unless there's a guarantee the while won't hang indefinitely?

num_bins = len(wl_block["state"].bin_indices)

# --- WL convergence loop ---
while wl_block["state"].mod_factor > wl_convergence_threshold:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, let's impose a maximum number of recursions for the while

)
new_entropy = float(self._entropy_d.get(int(new_bin_id), 0.0))

assert self.mcusher is not None, "MCUsher is not initialized"
Copy link
Copy Markdown
Collaborator

@esoteric-ephemera esoteric-ephemera Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linting will eventually complain about this assert, let's change to raise ValueError or a more specific DisorderedBuilderError

from ase.spacegroup import crystal


class PrototypeStructure(str, Enum):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use StrEnum? Our base is py3.11

Comment on lines +33 to +35
"smol",
"ase",
"scikit-learn",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these to a separate dependency group like disorder?

@esoteric-ephemera
Copy link
Copy Markdown
Collaborator

Thanks! Skimmed through mostly looking for structural stuff, will take a deeper look later

Maybe a major question for you: Do you see a benefit to moving some of the Wang-Landau code to pymatgen / ase for others to use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants