IMI Research Project · 2026

UHTM
Dataset
Ultra-High Temperature Materials

A collaboratively engineered 200-sample × 48-feature dataset of refractory ceramics & composites, with physics-informed feature engineering across four specialist domains.

200Samples
48Features
3Targets
8Groups
4Members
scroll

Project Overview

A structured, multi-member dataset engineering project targeting Ultra-High Temperature Materials (UHTMs) used in re-entry thermal protection systems and hypersonic aerospace applications.

🔥

Domain: UHTMs

Carbides (HfC, ZrC, TaC), Borides (HfB₂, ZrB₂), Nitrides (HfN, ZrN, TaN) and their composites — materials capable of withstanding >3000 K.

🧪

Physics-Informed Features

All 48 features are derived from 5 material anchors (Tₘ, ρ, vₑ, EN, Pₛ) using literature-validated physical relations with realistic noise injection.

🤝

Team-Split Architecture

Each member independently generates their feature groups from the shared base file, with a final merge step producing the full 200×53 matrix.

🤖

ML-Ready Targets

Three regression targets — Flexural Strength, Oxidation Resistance Score, and Thermal Shock Cycles — enable supervised learning and Bayesian optimisation.

Data Pipeline

A reproducible 4-step workflow from base generation to the final merged dataset.

01

Base Generation — Base.py

Krish generates UHTM_base_200.xlsx: 200 material entries (100 experimental + 100 synthetic) with 5 hidden physical anchors (Tₘ, ρ, vₑ, EN, sintering pressure) and metadata columns (Sample_ID, Material_System, Crystal_Structure, Synthesis_Method).

02

Individual Feature Engineering — 4 Members in Parallel

Each member loads the base file and independently computes their 12 assigned features using physics-derived formulas + Gaussian noise (seed=42 for reproducibility). Outputs are individual .xlsx files.

03

Validation & Merge — mergeAll.py

Asserts all 4 member files have 200 rows and matching Sample_IDs, then horizontally joins all feature groups into a single 200×53 DataFrame (5 meta + 48 features + 3 targets).

04

Final Output — UHTM_final_200x48.xlsx

Color-coded by feature group with a Summary Stats sheet and Feature Legend sheet. Also exported as CSV for ML pipelines.

Team Contributions

Each member owns a domain of 12 features, ensuring separation of concerns and expertise-driven feature engineering.

// Member 01
Aadi
Thermodynamic & Electronic Structure
📁 dataset.py  ·  F01–F12
Group A: Thermodynamic Group B: Electronic
F01Melting PointK
F02Debye TemperatureK
F03Cohesive EnergyeV/atom
F04Formation EnthalpykJ/mol
F05Lattice Parameter aÅ
F06Grüneisen Parameter
F07Band GapeV
F08DOS at Fermi Levelstates/eV
F09Bader Charge Transfere⁻
F10Fermi Velocitym/s
F11Valence Electron Density×10²²/cm³
F12Work FunctioneV
// Member 02
Krish
Mechanical & Thermal Transport · Base Generator
📁 Krishh.py · Base.py · mergeAll.py  ·  F13–F24
Group C: Mechanical Group D: Thermal 🏗 Base Author 🔗 Merge Lead
F13Young's ModulusGPa
F14Vickers HardnessGPa
F15Fracture Toughness KIcMPa√m
F16Compressive StrengthGPa
F17Poisson's Ratio
F18Flexural StrengthMPa
F19Thermal ConductivityW/m·K
F20Coeff. Thermal Expansion×10⁻⁶/K
F21Specific Heat CapacityJ/kg·K
F22Thermal Diffusivitym²/s
F23Max Service TemperatureK
F24Thermal Shock ResistanceW/m
// Member 03
Salan
Oxidation Stability & Microstructure
📁 Salan.py  ·  F25–F36
Group E: Oxidation Group F: Microstructural
F25Oxidation Onset TempK
F26Parabolic Rate Const kpkg²/m⁴s
F27Oxidation Activation EnergykJ/mol
F28Gravimetric Rate kpg²/cm⁴s
F29Oxide Layer Stability Index
F30Oxygen Diffusivity in Oxidem²/s
F31Avg Grain Sizeμm
F32Relative Density%
F33Porosity%
F34Crystallite Size (XRD)nm
F35Dislocation Density×10¹²/m²
F36Grain Boundary EnergyJ/m²
// Member 04
Niranjan
Phase/Composite · ML Descriptors · Targets
📁 Niranjan.py  ·  F37–F48 + T1–T3
Group G: Phase/Composite Group H: ML Descriptors 🎯 Target Variables
F37Phase Stability Index
F38Secondary Phase Vol. Fraction%
F39Interfacial EnergyJ/m²
F40CTE Mismatch Index
F41Solid Solution Distortion δ
F42Wettability Index
F43Thermal Merit IndexW/kg
F44Toughness-Stiffness IndexGPa·√GPa
F45Oxidation Merit Score
F46Bond Ionicity Fraction
F47Structural Stability Index
F48Creep Resistance Parameter
T1

Flexural Strength

Multi-factor regression target in MPa. Driven by stiffness, hardness, and porosity. Primary structural design metric.

T2

Oxidation Resistance Score

Composite score 0–10. Weighted from onset temperature, kp rate, stability index, and CTE mismatch.

T3

Thermal Shock Cycles

Predicted cycles to failure (integer). Driven by fracture toughness, CTE, and composite mismatch index.

Material Classes

Ten refractory material families anchored to literature values, spanning carbides, borides, and nitrides in both monolithic and composite forms.

MaterialCrystalTm (K)ρ (g/cm³)veΔENType
HfCFCC390012.2081.3carbide
ZrCFCC34206.7381.3carbide
TaCFCC388014.3091.1carbide
HfB₂HEX338010.5060.9boride
ZrB₂HEX32456.0960.9boride
TiCFCC31604.9381.5carbide
NbCFCC36007.7991.2carbide
HfNFCC338513.8091.6nitride
ZrNFCC29807.0991.6nitride
TaNHEX309016.30101.4nitride
🧫

Experimental (rows 1–100)

Monolithic + composite variants. Composites (index 50–99) include HfC-SiC, ZrB₂-SiC, TaC-HfC and 7 more multi-phase systems.

🔬

Synthetic (rows 101–200)

Monolithic + doped variants. Doped systems (index 150–199) include HfC:Y, ZrC:La, TaC:W and 7 more rare-earth doped compositions.

File Structure

IMI-Project-main/
├── Aadi-Dev/                       # Member 1 — Thermodynamic + Electronic
│   ├── dataset.py               # Generates F01–F12
│   ├── Aadi.xlsx                # Output: 200 × 17 (meta + 12 features)
│   └── UHTM_base_200.xlsx       # Base reference copy
├── Krish/                          # Member 2 — Mechanical + Thermal + Infrastructure
│   ├── Intro.py                 # Branch onboarding message
│   └── LAB EVALUATION/
│       ├── Krishh.py            # Generates F13–F24
│       ├── Krishh.xlsx          # Output: 200 × 17
│       └── Merge/
│           ├── Base.py          # ★ Generates UHTM_base_200.xlsx
│           ├── mergeAll.py      # ★ Final merge of all 4 member files
│           ├── AadiDev.xlsx     # Member 1 snapshot
│           ├── Krishh.xlsx      # Member 2 snapshot
│           ├── Niranjan.xlsx    # Member 4 snapshot
│           ├── Salan.xlsx       # Member 3 snapshot
│           └── UHTM_final_200x48.xlsx   # ★ FINAL DATASET
├── Niranjan/                       # Member 4 — Phase + ML + Targets
│   ├── Niranjan.py              # Generates F37–F48 + T1–T3
│   └── Niranjan.xlsx            # Output: 200 × 20
├── Salan/                          # Member 3 — Oxidation + Microstructural
│   └── lab evaluation/
│       ├── Salan.py             # Generates F25–F36
│       └── Salan_member3.xlsx   # Output: 200 × 17
├── UHTM_final_200x48.csv           # ★ ML-ready CSV export
└── UHTM_final_200x48.xlsx          # ★ Final annotated Excel

Setup & Execution

Run the pipeline in order. All scripts use numpy.random.seed(42) for full reproducibility.

1. Install dependencies

pip install pandas numpy openpyxl

2. Generate the base file (run once)

# From Krish/LAB EVALUATION/Merge/
python Base.py
# → UHTM_base_200.xlsx (200 rows × 10 columns)

3. Each member generates their features independently

# Member 1 (Aadi)
python Aadi-Dev/dataset.py           # → Aadi.xlsx     (F01–F12)

# Member 2 (Krish)
python Krish/LAB\ EVALUATION/Krishh.py  # → Krishh.xlsx   (F13–F24)

# Member 3 (Salan)
python Salan/lab\ evaluation/Salan.py    # → Salan_member3.xlsx  (F25–F36)

# Member 4 (Niranjan)
python Niranjan/Niranjan.py            # → Niranjan.xlsx (F37–F48 + T1–T3)

4. Merge all outputs into the final dataset

# Place all member xlsx files in Merge/ directory, then:
python Krish/LAB\ EVALUATION/Merge/mergeAll.py
# → UHTM_final_200x48.xlsx  (200 × 53: 5 meta + 48 features + 3 targets)
# → Summary Stats sheet + Feature Legend sheet included