UHTM Dataset — IMI Project README

// What is this?

Project Overview

A structured, multi-member dataset engineering project targeting Ultra-High Temperature Materials (UHTMs) used in re-entry thermal protection systems and hypersonic aerospace applications.

🔥

Domain: UHTMs

Carbides (HfC, ZrC, TaC), Borides (HfB₂, ZrB₂), Nitrides (HfN, ZrN, TaN) and their composites — materials capable of withstanding >3000 K.

🧪

Physics-Informed Features

All 48 features are derived from 5 material anchors (Tₘ, ρ, vₑ, EN, Pₛ) using literature-validated physical relations with realistic noise injection.

🤝

Team-Split Architecture

Each member independently generates their feature groups from the shared base file, with a final merge step producing the full 200×53 matrix.

🤖

ML-Ready Targets

Three regression targets — Flexural Strength, Oxidation Resistance Score, and Thermal Shock Cycles — enable supervised learning and Bayesian optimisation.

// How it works

Data Pipeline

A reproducible 4-step workflow from base generation to the final merged dataset.

Base Generation — `Base.py`

Krish generates UHTM_base_200.xlsx: 200 material entries (100 experimental + 100 synthetic) with 5 hidden physical anchors (Tₘ, ρ, vₑ, EN, sintering pressure) and metadata columns (Sample_ID, Material_System, Crystal_Structure, Synthesis_Method).

Individual Feature Engineering — 4 Members in Parallel

Each member loads the base file and independently computes their 12 assigned features using physics-derived formulas + Gaussian noise (seed=42 for reproducibility). Outputs are individual .xlsx files.

Validation & Merge — `mergeAll.py`

Asserts all 4 member files have 200 rows and matching Sample_IDs, then horizontally joins all feature groups into a single 200×53 DataFrame (5 meta + 48 features + 3 targets).

Final Output — `UHTM_final_200x48.xlsx`

Color-coded by feature group with a Summary Stats sheet and Feature Legend sheet. Also exported as CSV for ML pipelines.

// Contributors

Team Contributions

Each member owns a domain of 12 features, ensuring separation of concerns and expertise-driven feature engineering.

// Member 01

Aadi

Thermodynamic & Electronic Structure

📁 dataset.py · F01–F12

Group A: Thermodynamic Group B: Electronic

F01Melting PointK

F02Debye TemperatureK

F03Cohesive EnergyeV/atom

F04Formation EnthalpykJ/mol

F05Lattice Parameter aÅ

F06Grüneisen Parameter—

F07Band GapeV

F08DOS at Fermi Levelstates/eV

F09Bader Charge Transfere⁻

F10Fermi Velocitym/s

F11Valence Electron Density×10²²/cm³

F12Work FunctioneV

// Member 02

Krish

Mechanical & Thermal Transport · Base Generator

📁 Krishh.py · Base.py · mergeAll.py · F13–F24

Group C: Mechanical Group D: Thermal 🏗 Base Author 🔗 Merge Lead

F13Young's ModulusGPa

F14Vickers HardnessGPa

F15Fracture Toughness K_IcMPa√m

F16Compressive StrengthGPa

F17Poisson's Ratio—

F18Flexural StrengthMPa

F19Thermal ConductivityW/m·K

F20Coeff. Thermal Expansion×10⁻⁶/K

F21Specific Heat CapacityJ/kg·K

F22Thermal Diffusivitym²/s

F23Max Service TemperatureK

F24Thermal Shock ResistanceW/m

// Member 03

Salan

Oxidation Stability & Microstructure

📁 Salan.py · F25–F36

Group E: Oxidation Group F: Microstructural

F25Oxidation Onset TempK

F26Parabolic Rate Const k_pkg²/m⁴s

F27Oxidation Activation EnergykJ/mol

F28Gravimetric Rate k_pg²/cm⁴s

F29Oxide Layer Stability Index—

F30Oxygen Diffusivity in Oxidem²/s

F31Avg Grain Sizeμm

F32Relative Density%

F33Porosity%

F34Crystallite Size (XRD)nm

F35Dislocation Density×10¹²/m²

F36Grain Boundary EnergyJ/m²

// Member 04

Niranjan

Phase/Composite · ML Descriptors · Targets

📁 Niranjan.py · F37–F48 + T1–T3

Group G: Phase/Composite Group H: ML Descriptors 🎯 Target Variables

F37Phase Stability Index—

F38Secondary Phase Vol. Fraction%

F39Interfacial EnergyJ/m²

F40CTE Mismatch Index—

F41Solid Solution Distortion δ—

F42Wettability Index—

F43Thermal Merit IndexW/kg

F44Toughness-Stiffness IndexGPa·√GPa

F45Oxidation Merit Score—

F46Bond Ionicity Fraction—

F47Structural Stability Index—

F48Creep Resistance Parameter—

// Target Variables (Niranjan · Member 04)

Flexural Strength

Multi-factor regression target in MPa. Driven by stiffness, hardness, and porosity. Primary structural design metric.

Oxidation Resistance Score

Composite score 0–10. Weighted from onset temperature, k_p rate, stability index, and CTE mismatch.

Thermal Shock Cycles

Predicted cycles to failure (integer). Driven by fracture toughness, CTE, and composite mismatch index.

// Data composition

Material Classes

Ten refractory material families anchored to literature values, spanning carbides, borides, and nitrides in both monolithic and composite forms.

Material	Crystal	T_m (K)	ρ (g/cm³)	v_e	ΔEN	Type
HfC	FCC	3900	12.20	8	1.3	carbide
ZrC	FCC	3420	6.73	8	1.3	carbide
TaC	FCC	3880	14.30	9	1.1	carbide
HfB₂	HEX	3380	10.50	6	0.9	boride
ZrB₂	HEX	3245	6.09	6	0.9	boride
TiC	FCC	3160	4.93	8	1.5	carbide
NbC	FCC	3600	7.79	9	1.2	carbide
HfN	FCC	3385	13.80	9	1.6	nitride
ZrN	FCC	2980	7.09	9	1.6	nitride
TaN	HEX	3090	16.30	10	1.4	nitride

🧫

Experimental (rows 1–100)

Monolithic + composite variants. Composites (index 50–99) include HfC-SiC, ZrB₂-SiC, TaC-HfC and 7 more multi-phase systems.

🔬

Synthetic (rows 101–200)

Monolithic + doped variants. Doped systems (index 150–199) include HfC:Y, ZrC:La, TaC:W and 7 more rare-earth doped compositions.

// Repository layout

File Structure

IMI-Project-main/
├── Aadi-Dev/                       # Member 1 — Thermodynamic + Electronic
│   ├── dataset.py               # Generates F01–F12
│   ├── Aadi.xlsx                # Output: 200 × 17 (meta + 12 features)
│   └── UHTM_base_200.xlsx       # Base reference copy
├── Krish/                          # Member 2 — Mechanical + Thermal + Infrastructure
│   ├── Intro.py                 # Branch onboarding message
│   └── LAB EVALUATION/
│       ├── Krishh.py            # Generates F13–F24
│       ├── Krishh.xlsx          # Output: 200 × 17
│       └── Merge/
│           ├── Base.py          # ★ Generates UHTM_base_200.xlsx
│           ├── mergeAll.py      # ★ Final merge of all 4 member files
│           ├── AadiDev.xlsx     # Member 1 snapshot
│           ├── Krishh.xlsx      # Member 2 snapshot
│           ├── Niranjan.xlsx    # Member 4 snapshot
│           ├── Salan.xlsx       # Member 3 snapshot
│           └── UHTM_final_200x48.xlsx   # ★ FINAL DATASET
├── Niranjan/                       # Member 4 — Phase + ML + Targets
│   ├── Niranjan.py              # Generates F37–F48 + T1–T3
│   └── Niranjan.xlsx            # Output: 200 × 20
├── Salan/                          # Member 3 — Oxidation + Microstructural
│   └── lab evaluation/
│       ├── Salan.py             # Generates F25–F36
│       └── Salan_member3.xlsx   # Output: 200 × 17
├── UHTM_final_200x48.csv           # ★ ML-ready CSV export
└── UHTM_final_200x48.xlsx          # ★ Final annotated Excel

// Getting started

Setup & Execution

Run the pipeline in order. All scripts use numpy.random.seed(42) for full reproducibility.

1. Install dependencies

pip install pandas numpy openpyxl

2. Generate the base file (run once)

# From Krish/LAB EVALUATION/Merge/
python Base.py
# → UHTM_base_200.xlsx (200 rows × 10 columns)

3. Each member generates their features independently

# Member 1 (Aadi)
python Aadi-Dev/dataset.py           # → Aadi.xlsx     (F01–F12)

# Member 2 (Krish)
python Krish/LAB\ EVALUATION/Krishh.py  # → Krishh.xlsx   (F13–F24)

# Member 3 (Salan)
python Salan/lab\ evaluation/Salan.py    # → Salan_member3.xlsx  (F25–F36)

# Member 4 (Niranjan)
python Niranjan/Niranjan.py            # → Niranjan.xlsx (F37–F48 + T1–T3)

4. Merge all outputs into the final dataset

# Place all member xlsx files in Merge/ directory, then:
python Krish/LAB\ EVALUATION/Merge/mergeAll.py
# → UHTM_final_200x48.xlsx  (200 × 53: 5 meta + 48 features + 3 targets)
# → Summary Stats sheet + Feature Legend sheet included

UHTM Dataset Ultra-High Temperature Materials