A collaboratively engineered 200-sample × 48-feature dataset of refractory ceramics and composites,
built by a 4-member team using physics-informed feature engineering across 8 material property domains.
📋 Website : https://imi-project.vercel.app/
- Project Overview
- Dataset Summary
- Repository Structure
- Data Pipeline
- Team Contributions
- Feature Groups at a Glance
- Material Classes
- Target Variables
- Setup & Usage
- Output Files
This project constructs a structured, ML-ready dataset for Ultra-High Temperature Materials (UHTMs) — refractory ceramics used in hypersonic re-entry vehicles, thermal protection systems (TPS), and aerospace leading-edge components.
The dataset covers 10 material families (carbides, borides, nitrides) and their composites/doped variants across 8 physical property domains, with 3 supervised regression targets. All features are derived from 5 literature-anchored material properties using validated physical relations plus realistic Gaussian noise (np.random.seed(42)).
Key design principles:
- Each of the 4 team members owns a distinct physical domain → clean separation of concerns
- All features trace back to 5 shared anchors (
Tₘ,ρ,vₑ,ΔEN,Pₛ) → internally consistent dataset - Reproducible: fixed seed, shared base file, merge validation enforced programmatically
| Property | Value |
|---|---|
| Total Samples | 200 (100 experimental + 100 synthetic) |
| Feature Columns | 48 (F01–F48) |
| Target Columns | 3 (T1–T3) |
| Metadata Columns | 5 (Sample_ID, Material_System, Source_Type, Synthesis_Method, Crystal_Structure) |
| Total Columns | 56 |
| Material Families | 10 base + composites + doped variants |
| Random Seed | 42 (all scripts) |
| Final Output | UHTM_final_200x48.xlsx + UHTM_final_200x48.csv |
IMI-Project-main/
│
├── Aadi-Dev/ # Member 1 — Thermodynamic + Electronic
│ ├── dataset.py # ★ Generates F01–F12
│ ├── Aadi.xlsx # Output: 200 × 17 (meta + 12 features)
│ └── UHTM_base_200.xlsx # Base reference copy
│
├── Krish/ # Member 2 — Mechanical + Thermal + Infrastructure
│ ├── Intro.py # Branch onboarding note
│ └── LAB EVALUATION/
│ ├── Krishh.py # ★ Generates F13–F24
│ ├── Krishh.xlsx # Output: 200 × 17
│ ├── UHTM_base_200.xlsx
│ └── Merge/
│ ├── Base.py # ★★ Generates UHTM_base_200.xlsx (run first)
│ ├── mergeAll.py # ★★ Final merge of all 4 member files
│ ├── AadiDev.xlsx # Member 1 snapshot for merge
│ ├── Krishh.xlsx # Member 2 snapshot for merge
│ ├── Niranjan.xlsx # Member 4 snapshot for merge
│ ├── Salan.xlsx # Member 3 snapshot for merge
│ └── UHTM_final_200x48.xlsx # ★★ FINAL MERGED DATASET
│
├── Niranjan/ # Member 4 — Phase + ML Descriptors + Targets
│ ├── Niranjan.py # ★ Generates F37–F48 + T1, T2, T3
│ ├── Niranjan.xlsx # Output: 200 × 20
│ └── UHTM_base_200.xlsx
│
├── Salan/ # Member 3 — Oxidation + Microstructural
│ ├── lab evaluation/
│ │ ├── Salan.py # ★ Generates F25–F36
│ │ ├── Salan_member3.xlsx # Output: 200 × 17
│ │ └── UHTM_base_200.xlsx
│ └── Backup_datasets/
│ ├── UHTM_Complete.csv
│ ├── UHTM_Complete.xlsx
│ └── completefile.py
│
├── UHTM_final_200x48.csv # ★★ ML-ready CSV (root copy)
└── UHTM_final_200x48.xlsx # ★★ Final annotated Excel (root copy)
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ STEP 1: Base.py │
│ ───────────── │
│ Krish generates UHTM_base_200.xlsx │
│ 200 rows × 10 cols (5 meta + 5 hidden anchors: Tₘ, ρ, vₑ, ΔEN, Pₛ) │
│ │ │
│ ┌───────────────┼────────────────┐ │
│ ▼ ▼ ▼ ▼ │
│ STEP 2a: Aadi 2b: Krish 2c: Salan 2d: Niranjan │
│ dataset.py Krishh.py Salan.py Niranjan.py │
│ F01–F12 F13–F24 F25–F36 F37–F48 + T1–T3 │
│ Aadi.xlsx Krishh.xlsx Salan_m3.xlsx Niranjan.xlsx │
│ │ │ │ │ │
│ └───────────────┴────────────────┴──────────────┘ │
│ │ │
│ ▼ │
│ STEP 3: mergeAll.py │
│ ──────────────────── │
│ Validates 200 rows + Sample_ID match across all 4 files │
│ Horizontally joins all feature groups │
│ │ │
│ ▼ │
│ STEP 4: UHTM_final_200x48.xlsx / .csv │
│ 200 rows × 56 cols (5 meta + 48 features + 3 targets) │
│ + Summary Stats sheet + Feature Legend sheet │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Script: Aadi-Dev/dataset.py | Output: Aadi.xlsx | Features: F01–F12
Aadi is responsible for the foundational material properties spanning thermodynamic stability and quantum electronic structure — the two groups that most directly determine a material's suitability as a UHTM candidate.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F01 | Melting Point | Literature anchor Tₘ + 2% noise |
K |
| F02 | Debye Temperature | θ_D ≈ 300 + 0.15·Tₘ (Debye-Grüneisen scaling) |
K |
| F03 | Cohesive Energy | E_coh = vₑ·0.82 + ΔEN·1.1 (metallic + ionic/covalent) |
eV/atom |
| F04 | Formation Enthalpy | ΔHf = -(ΔEN·45 + vₑ·8) (exothermic = stable) |
kJ/mol |
| F05 | Lattice Parameter a | a ≈ 3.2 + ρ^(-0.3)·0.5 (inverse power law with density) |
Å |
| F06 | Grüneisen Parameter | γ ≈ 0.4 + ΔEN·0.3 + vₑ·0.05 (anharmonicity index) |
— |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F07 | Band Gap | max(0, 0.5 - vₑ·0.05) — metallic carbides/nitrides ≈ 0 |
eV |
| F08 | DOS at Fermi Level | N(Ef) ≈ vₑ·0.55 + 1.2 (more valence e⁻ → higher DOS) |
states/eV |
| F09 | Bader Charge Transfer | Δq ≈ ΔEN·0.6 (Pauling-type charge transfer) |
e⁻ |
| F10 | Fermi Velocity | v_F = 1×10⁶ · (vₑ/8.5) (free-electron model) |
m/s |
| F11 | Valence Electron Density | n = ρ·vₑ·1.8×10²² (electrons per unit volume) |
×10²²/cm³ |
| F12 | Work Function | φ ≈ 3.5 + ΔEN·0.8 - vₑ·0.05 (surface escape energy) |
eV |
Physical significance: F01 is the primary UHTM selection criterion. F07 distinguishes metallic from semiconducting behaviour at high T. F11 governs metallic bonding strength and shear modulus. F12 controls thermionic emission.
Scripts: Krishh.py, Base.py, mergeAll.py | Output: Krishh.xlsx | Features: F13–F24
Krish owns both mechanical integrity and thermal transport features — the two most critical property groups for structural aerospace applications. Krish also authored the base dataset generator (Base.py) and the final merge script (mergeAll.py), serving as the project's data infrastructure lead.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F13 | Young's Modulus | E ≈ 200 + Tₘ·0.04 + vₑ·15 (Gilman-Cohen stiffness relation) |
GPa |
| F14 | Vickers Hardness | H_v ≈ 15 + ΔEN·8 + vₑ·1.2 (bond character + electron count) |
GPa |
| F15 | Fracture Toughness K_Ic | K_Ic ≈ 2.5 + 1.5/ΔEN (ionic bonds = more brittle) |
MPa√m |
| F16 | Compressive Strength | σ_c ≈ 0.6·E (empirical ratio for dense ceramics) |
GPa |
| F17 | Poisson's Ratio | ν ≈ 0.18 + ΔEN·0.02 (covalent ≈ 0.18; ionic → higher) |
— |
| F18 | Flexural Strength | σ_f = 300 + E·0.8 + H·12 - porosity·15 |
MPa |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F19 | Thermal Conductivity | κ = 20 + vₑ·4 - ΔEN·3 (electronic + ionic scattering) |
W/m·K |
| F20 | Coeff. Thermal Expansion | CTE = 6.5 + ΔEN·0.8 - vₑ·0.3 |
×10⁻⁶/K |
| F21 | Specific Heat Capacity | Cp ≈ 180 + 800/ρ (inverse density, Dulong-Petit) |
J/kg·K |
| F22 | Thermal Diffusivity | α = κ / (ρ·Cp) (exact thermodynamic definition) |
m²/s |
| F23 | Max Service Temperature | T_max ≈ 0.75·Tₘ (standard refractory engineering rule) |
K |
| F24 | Thermal Shock Resistance | R = σ_f·κ / (E·CTE) (Hasselman R-parameter) |
W/m |
Infrastructure contributions:
Base.pydefines all 10 material anchors from literature (Materials Project, JARVIS-DFT, Fahrenholtz & Hilmas 2012).mergeAll.pyvalidates row counts and Sample_ID alignment before joining, preventing silent merge errors.
Script: Salan/lab evaluation/Salan.py | Output: Salan_member3.xlsx | Features: F25–F36
Salan handles the chemical stability and process-microstructure features — the groups that determine how a UHTM behaves over time in oxidising environments and how its properties are influenced by the synthesis route.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F25 | Oxidation Onset Temperature | T_ox = 800 + ΔEN·150 + Tₘ·0.05 |
K |
| F26 | Parabolic Rate Constant k_p | k_p = 1×10⁻¹² · exp(ΔEN·0.8) (Arrhenius) |
kg²/m⁴s |
| F27 | Oxidation Activation Energy | Ea = 120 + ΔEN·30 (diffusion barrier) |
kJ/mol |
| F28 | Gravimetric Parabolic Rate | k_p_grav = 1×10⁻¹⁰ · ΔEN (TGA standard unit) |
g²/cm⁴s |
| F29 | Oxide Layer Stability Index | OLS = ΔEN·0.7 + vₑ·0.15 (protective scale adherence) |
— |
| F30 | Oxygen Diffusivity in Oxide | D_O = 1×10⁻¹⁴ · exp(-ΔEN·1.2) |
m²/s |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F31 | Average Grain Size | d = 2 + (Pₛ^-0.3)·10 (Hall-Petch: smaller grains → harder) |
μm |
| F32 | Relative Density | ρ_rel = min(99.9, 92 + Pₛ·0.18) |
% |
| F33 | Porosity | P = max(0.1, 100 - ρ_rel) (complement of relative density) |
% |
| F34 | Crystallite Size (XRD) | D_xrd = grain_size_μm · 1000 · 0.08 (Scherrer ~8% of grain) |
nm |
| F35 | Dislocation Density | ρ_disl = 1×10¹²/d^1.5 (inverse power law with grain size) |
×10¹²/m² |
| F36 | Grain Boundary Energy | γ_gb = 0.3 + ΔEN·0.25 + vₑ·0.02 |
J/m² |
Physical significance: F25 is the primary chemical stability criterion. F26 determines oxide scale growth rate — smaller k_p = better protection. F31–F33 directly link sintering pressure (Pₛ anchor) to microstructure, closing the process–property loop.
Script: Niranjan/Niranjan.py | Output: Niranjan.xlsx | Features: F37–F48 + Targets: T1–T3
Niranjan handles the highest-level features: composite system descriptors, dimensionless ML merit indices designed for Pareto optimisation, and all three supervised learning targets. This is the most interdependent feature block — many features in Groups G and H re-derive quantities from Groups A–F using the same seed to ensure cross-member consistency.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F37 | Phase Stability Index | PSI = Tₘ/4000 + E_coh/12 (normalised thermodynamic stability) |
— |
| F38 | Secondary Phase Vol. Fraction | 0% monolithic; 5–20% composites (index-dependent) | % |
| F39 | Interfacial Energy | γ_int = 0.5 + ΔEN·0.3 (ionic mismatch → delamination risk) |
J/m² |
| F40 | CTE Mismatch Index | `ΔCTE = | CTE - 5.0 |
| F41 | Solid Solution Distortion δ | δ = ΔEN·0.12 + (vₑ%3)·0.05 (Hume-Rothery lattice distortion) |
— |
| F42 | Wettability Index | W = 1 - f_ion = vₑ·0.15/(ΔEN + vₑ·0.15) (covalent fraction) |
— |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F43 | Thermal Merit Index | TMI = κ/(CTE·ρ) (Ashby figure of merit for TPS panels) |
W/kg |
| F44 | Toughness-Stiffness Index | TSI = K_Ic·√E (combined crack resistance and stiffness) |
GPa·√GPa |
| F45 | Oxidation Merit Score | OMS = T_ox/(k_p_norm + 1) (Bayesian optimisation reward signal) |
— |
| F46 | Bond Ionicity Fraction | f_ion = ΔEN/(ΔEN + vₑ·0.15) (Pauling ionicity) |
— |
| F47 | Structural Stability Index | SSI = (Tₘ/4000)·(E_coh/10)·(1 - P/100) |
— |
| F48 | Creep Resistance Parameter | CR = (Tₘ/3000)·(E/400)·(1/d)^0.3 (grain size + stiffness) |
— |
All three targets are defined and computed by Niranjan (Member 4).
| Target | Name | Type | Description |
|---|---|---|---|
| T1 | Flexural Strength | Continuous (MPa) | Multi-factor structural target. Driven by Young's Modulus, Vickers Hardness, and Porosity. Primary design metric for load-bearing structures. |
| T2 | Oxidation Resistance Score | Continuous (0–10) | Composite weighted score: onset temperature (×3) + rate constant (×2) + stability index (×2) + CTE match (×1). |
| T3 | Thermal Shock Cycles | Integer (cycles) | Predicted cycles-to-failure under rapid thermal cycling. Driven by K_Ic, CTE, and composite mismatch index F40. |
| Group | Features | Member | Domain | Colour (in Excel) |
|---|---|---|---|---|
| A — Thermodynamic | F01–F06 | Aadi | Phase stability, bonding energetics | Teal |
| B — Electronic | F07–F12 | Aadi | DFT / quantum properties | Blue |
| C — Mechanical | F13–F18 | Krish | Structural integrity | Purple |
| D — Thermal Transport | F19–F24 | Krish | Heat transport & diffusion | Orange |
| E — Oxidation | F25–F30 | Salan | Chemical stability | Red |
| F — Microstructural | F31–F36 | Salan | Process–structure–property | Green |
| G — Phase/Composite | F37–F42 | Niranjan | Multi-phase composite systems | Indigo |
| H — ML Descriptors | F43–F48 | Niranjan | Pareto / reward signals for inverse design | Teal-dark |
| Targets | T1–T3 | Niranjan | Supervised regression targets | Dark Red |
Base anchor values sourced from: Materials Project (mp-*), JARVIS-DFT, Fahrenholtz & Hilmas (2012), Cedillos-Barraza et al. (2016), Opeka et al. J. Eur. Ceram. Soc.
| Material | Crystal | T_m (K) | ρ (g/cm³) | v_e | ΔEN |
|---|---|---|---|---|---|
| HfC | FCC | 3900 | 12.20 | 8 | 1.3 |
| ZrC | FCC | 3420 | 6.73 | 8 | 1.3 |
| TaC | FCC | 3880 | 14.30 | 9 | 1.1 |
| HfB₂ | HEX | 3380 | 10.50 | 6 | 0.9 |
| ZrB₂ | HEX | 3245 | 6.09 | 6 | 0.9 |
| TiC | FCC | 3160 | 4.93 | 8 | 1.5 |
| NbC | FCC | 3600 | 7.79 | 9 | 1.2 |
| HfN | FCC | 3385 | 13.80 | 9 | 1.6 |
| ZrN | FCC | 2980 | 7.09 | 9 | 1.6 |
| TaN | HEX | 3090 | 16.30 | 10 | 1.4 |
Composite variants (experimental, index 50–99): HfC-SiC, ZrB₂-SiC, HfB₂-SiC, TaC-HfC, ZrC-TiC, HfC-TaC, ZrB₂-ZrC, HfB₂-MoSi₂, TiC-TiB₂, NbC-HfC
Doped variants (synthetic, index 150–199): HfC:Y, ZrC:La, TaC:W, HfB₂:Al, ZrB₂:Y, TiC:Nb, NbC:Ta, HfN:Zr, ZrN:Hf, TaN:Nb
pip install pandas numpy openpyxl# From Krish/LAB EVALUATION/Merge/
python Base.py
# Output: UHTM_base_200.xlsx
# Contains: 200 rows × 10 cols (5 metadata + 5 physical anchors)# Member 1 — Aadi
cd Aadi-Dev/
python dataset.py
# Output: Aadi.xlsx (200 × 17)
# Member 2 — Krish
cd "Krish/LAB EVALUATION/"
python Krishh.py
# Output: Krishh.xlsx (200 × 17)
# Member 3 — Salan
cd "Salan/lab evaluation/"
python Salan.py
# Output: Salan_member3.xlsx (200 × 17)
# Member 4 — Niranjan
cd Niranjan/
python Niranjan.py
# Output: Niranjan.xlsx (200 × 20: 12 features + 3 targets + 5 meta)# Copy all member xlsx files into Krish/LAB EVALUATION/Merge/
# Rename if needed: AadiDev.xlsx, Krishh.xlsx, Salan.xlsx, Niranjan.xlsx
cd "Krish/LAB EVALUATION/Merge/"
python mergeAll.py
# Validates: 200 rows + Sample_ID match across all 4 files
# Output: UHTM_final_200x48.xlsx
# Sheet 1 — UHTM_Full_Dataset (200 × 56, colour-coded by group)
# Sheet 2 — Summary Stats (describe() for all F and T columns)
# Sheet 3 — Feature Legend (group → member → domain mapping)All scripts use np.random.seed(42) at the top level. As long as the base file is generated first and member scripts are run with the same seed, all outputs are deterministic.
| File | Location | Description |
|---|---|---|
UHTM_base_200.xlsx |
Krish/LAB EVALUATION/Merge/ |
Base dataset with material anchors. Input to all 4 member scripts. |
Aadi.xlsx |
Aadi-Dev/ |
Member 1 output: F01–F12 |
Krishh.xlsx |
Krish/LAB EVALUATION/ |
Member 2 output: F13–F24 |
Salan_member3.xlsx |
Salan/lab evaluation/ |
Member 3 output: F25–F36 |
Niranjan.xlsx |
Niranjan/ |
Member 4 output: F37–F48 + T1–T3 |
UHTM_final_200x48.xlsx |
Root + Merge/ |
★ Final merged dataset with summary and legend sheets |
UHTM_final_200x48.csv |
Root | ★ ML-ready flat CSV export |
| Member | Features | Scripts |
|---|---|---|
| Aadi | F01–F12 (Thermodynamic + Electronic) | Aadi-Dev/dataset.py |
| Krish | F13–F24 (Mechanical + Thermal) + Base + Merge | Krishh.py, Base.py, mergeAll.py |
| Salan | F25–F36 (Oxidation + Microstructural) | Salan/lab evaluation/Salan.py |
| Niranjan | F37–F48 (Phase + ML Descriptors) + T1–T3 | Niranjan/Niranjan.py |
IMI Project · 2026 · Ultra-High Temperature Materials Dataset