Context
We've been using CrystalFormer for generative design of layered mineral membranes (Fe-Ni-S sulfides for proton transport). Over the past few weeks we generated ~7000+ structures using both targeted and wild sampling, and built an ad-hoc Voronoi analysis on top to filter candidates by layeredness and void size.
This worked well — 91% of our targeted generations were layered, 87% had voids > 1 Å — but the analysis code is messy, single-purpose, and not reusable. We think there's value in building proper post-generation analysis and screening tools directly into CrystalFormer, and wanted to discuss the idea before investing time in a PR.
The gap
CrystalFormer does an excellent job at generation, but the workflow after generation is entirely on the user:
CrystalFormer generates structures → ??? → user somehow evaluates them
For many use cases (ionic conductors, membranes, porous materials, MOFs, zeolites), the key question isn't just "is this a valid crystal?" but "does this crystal have the structural features I need?" — channels, voids, layered morphology, specific compositions, etc.
Proposal: three modules
We'd like to contribute three chemistry-agnostic modules. None of these are tied to our specific Fe-S use case — they work for any crystal system.
1. Structural analysis (crystalformer/analysis/)
Voronoi-based characterization of generated structures:
Void analysis (voronoi.py):
max_void_radius — largest inscribed sphere radius (Å)
void_fraction — free volume fraction
layeredness_score — anisotropy of Voronoi vertices (PCA-based, 0 = isotropic, 1 = perfectly layered)
interlayer_spacing — layer separation for layered structures (Å)
Channel/percolation analysis (percolation.py):
- Build Voronoi network graph (nodes = Voronoi vertices, edges = faces between cells)
- Filter edges by bottleneck >
r_probe (configurable, default 0.4 Å for H⁺)
- Check percolation through periodic boundaries in a/b/c directions via BFS
- Report:
percolation_dimensionality (0D/1D/2D/3D), min_bottleneck along best channel path
- This answers the question "can an ion of radius r travel through this structure?" — useful for any ionic conductor screening
CLI:
python -m crystalformer.analysis structures.csv --r-probe 0.4 --check-percolation --output results.csv
Python API:
from crystalformer.analysis import VoronoiAnalyzer, PercolationAnalyzer
va = VoronoiAnalyzer()
result = va.analyze(structure) # pymatgen Structure
# result.max_void_radius → 1.82 Å
# result.layeredness_score → 0.91
pa = PercolationAnalyzer(r_probe=0.4)
result = pa.analyze(structure)
# result.percolation_dimensionality → 2 (layered conductor)
# result.min_bottleneck → 0.62 Å
# result.percolates_c → True
2. Compositional guided sampling
Bias atom-type logits during autoregressive decoding to steer generation toward desired compositions — without retraining or fine-tuning.
Mechanism: At each atom-type sampling step, add a bias vector to the logits before softmax. Positive bias encourages an element, negative suppresses it. Zero = unchanged (default behavior preserved).
# "I want more structures with Li and O"
structures = sample_crystal(
params, n_samples=1000,
composition_bias={"Li": 2.0, "O": 1.5, "P": 1.0}
)
python -m crystalformer.sample --checkpoint model.pkl \
--n-samples 1000 \
--composition-bias "Li:2.0,O:1.5,P:1.0"
This is deliberately simple — not a hard constraint, just a soft nudge. It turns "generate random structures and hope for the right composition" into "generate structures enriched in desired elements." For us this increased Fe-S hit rate from ~12% to ~60% without any model changes.
Scope: Only modifies the sampling function — no changes to model architecture, training, or weights.
3. Screening pipeline (crystalformer/screen/)
A pluggable filter chain for batch screening of generated structures.
Built-in filters:
| Filter |
Criterion |
Default |
validity |
Valid structure (atoms placed, no overlaps < 0.5 Å) |
ON |
voronoi |
max_void_radius > threshold |
r > 1.0 Å |
percolation |
percolation_dimensionality >= threshold |
dim >= 1 |
composition |
Contains specified elements |
OFF |
density |
Density within range |
OFF |
Custom filters (user-extensible):
from crystalformer.screen import ScreeningPipeline, Filter
class StabilityFilter(Filter):
"""Example: user plugs in their own ML potential."""
def __call__(self, structure) -> bool:
energy = my_ml_model.predict(structure)
return energy < self.threshold
pipeline = ScreeningPipeline([
"validity",
"voronoi:r_min=0.4",
"percolation:dim=2",
StabilityFilter(threshold=0.05),
])
results = pipeline.run("generated.csv", output="screened.csv")
# 5000 → 4200 valid → 1890 voronoi → 340 percolating → 52 stable
CLI:
python -m crystalformer.screen generated/ \
--filters validity,voronoi,percolation \
--r-probe 0.4 \
--output screened.csv
The pipeline prints a summary table showing how many structures pass each stage — makes it easy to see where the funnel narrows.
Dependencies
Only pymatgen (already used for I/O) and scipy (for scipy.spatial.Voronoi), as optional:
[project.optional-dependencies]
analysis = ["pymatgen>=2024.1.1", "scipy>=1.10"]
No heavy ML dependencies. Core CrystalFormer stays lightweight.
File structure (all new files, minimal changes to existing code)
crystalformer/
├── analysis/
│ ├── __init__.py
│ ├── voronoi.py # VoronoiAnalyzer
│ ├── percolation.py # PercolationAnalyzer
│ └── __main__.py # CLI
├── screen/
│ ├── __init__.py
│ ├── pipeline.py # ScreeningPipeline, Filter base
│ ├── filters.py # Built-in filters
│ └── __main__.py # CLI
└── src/
└── sample.py # + optional composition_bias param
Questions before we start
- Do you already have internal analysis/screening tools? We don't want to duplicate existing work. If you have something similar internally, we'd be happy to build on it instead.
- Is this in scope for the main repo? If you'd prefer this to live as a separate package (e.g.,
crystalformer-analysis), that works too.
- Single PR or incremental? We can do one PR with everything, or split into 2-3 (analysis → sampling → screening).
- Any naming/API conventions you'd like us to follow?
Happy to discuss any aspect of this. We've been enjoying working with CrystalFormer and would like to contribute back to the project.
Context
We've been using CrystalFormer for generative design of layered mineral membranes (Fe-Ni-S sulfides for proton transport). Over the past few weeks we generated ~7000+ structures using both targeted and wild sampling, and built an ad-hoc Voronoi analysis on top to filter candidates by layeredness and void size.
This worked well — 91% of our targeted generations were layered, 87% had voids > 1 Å — but the analysis code is messy, single-purpose, and not reusable. We think there's value in building proper post-generation analysis and screening tools directly into CrystalFormer, and wanted to discuss the idea before investing time in a PR.
The gap
CrystalFormer does an excellent job at generation, but the workflow after generation is entirely on the user:
For many use cases (ionic conductors, membranes, porous materials, MOFs, zeolites), the key question isn't just "is this a valid crystal?" but "does this crystal have the structural features I need?" — channels, voids, layered morphology, specific compositions, etc.
Proposal: three modules
We'd like to contribute three chemistry-agnostic modules. None of these are tied to our specific Fe-S use case — they work for any crystal system.
1. Structural analysis (
crystalformer/analysis/)Voronoi-based characterization of generated structures:
Void analysis (
voronoi.py):max_void_radius— largest inscribed sphere radius (Å)void_fraction— free volume fractionlayeredness_score— anisotropy of Voronoi vertices (PCA-based, 0 = isotropic, 1 = perfectly layered)interlayer_spacing— layer separation for layered structures (Å)Channel/percolation analysis (
percolation.py):r_probe(configurable, default 0.4 Å for H⁺)percolation_dimensionality(0D/1D/2D/3D),min_bottleneckalong best channel pathCLI:
Python API:
2. Compositional guided sampling
Bias atom-type logits during autoregressive decoding to steer generation toward desired compositions — without retraining or fine-tuning.
Mechanism: At each atom-type sampling step, add a bias vector to the logits before softmax. Positive bias encourages an element, negative suppresses it. Zero = unchanged (default behavior preserved).
python -m crystalformer.sample --checkpoint model.pkl \ --n-samples 1000 \ --composition-bias "Li:2.0,O:1.5,P:1.0"This is deliberately simple — not a hard constraint, just a soft nudge. It turns "generate random structures and hope for the right composition" into "generate structures enriched in desired elements." For us this increased Fe-S hit rate from ~12% to ~60% without any model changes.
Scope: Only modifies the sampling function — no changes to model architecture, training, or weights.
3. Screening pipeline (
crystalformer/screen/)A pluggable filter chain for batch screening of generated structures.
Built-in filters:
validityvoronoipercolationcompositiondensityCustom filters (user-extensible):
CLI:
python -m crystalformer.screen generated/ \ --filters validity,voronoi,percolation \ --r-probe 0.4 \ --output screened.csvThe pipeline prints a summary table showing how many structures pass each stage — makes it easy to see where the funnel narrows.
Dependencies
Only
pymatgen(already used for I/O) andscipy(forscipy.spatial.Voronoi), as optional:No heavy ML dependencies. Core CrystalFormer stays lightweight.
File structure (all new files, minimal changes to existing code)
Questions before we start
crystalformer-analysis), that works too.Happy to discuss any aspect of this. We've been enjoying working with CrystalFormer and would like to contribute back to the project.