A collaboratively engineered 200-sample × 48-feature dataset of refractory ceramics & composites, with physics-informed feature engineering across four specialist domains.
// What is this?
A structured, multi-member dataset engineering project targeting Ultra-High Temperature Materials (UHTMs) used in re-entry thermal protection systems and hypersonic aerospace applications.
Carbides (HfC, ZrC, TaC), Borides (HfB₂, ZrB₂), Nitrides (HfN, ZrN, TaN) and their composites — materials capable of withstanding >3000 K.
All 48 features are derived from 5 material anchors (Tₘ, ρ, vₑ, EN, Pₛ) using literature-validated physical relations with realistic noise injection.
Each member independently generates their feature groups from the shared base file, with a final merge step producing the full 200×53 matrix.
Three regression targets — Flexural Strength, Oxidation Resistance Score, and Thermal Shock Cycles — enable supervised learning and Bayesian optimisation.
// How it works
A reproducible 4-step workflow from base generation to the final merged dataset.
Base.pyKrish generates UHTM_base_200.xlsx: 200 material entries (100 experimental + 100 synthetic) with 5 hidden physical anchors (Tₘ, ρ, vₑ, EN, sintering pressure) and metadata columns (Sample_ID, Material_System, Crystal_Structure, Synthesis_Method).
Each member loads the base file and independently computes their 12 assigned features using physics-derived formulas + Gaussian noise (seed=42 for reproducibility). Outputs are individual .xlsx files.
mergeAll.pyAsserts all 4 member files have 200 rows and matching Sample_IDs, then horizontally joins all feature groups into a single 200×53 DataFrame (5 meta + 48 features + 3 targets).
UHTM_final_200x48.xlsxColor-coded by feature group with a Summary Stats sheet and Feature Legend sheet. Also exported as CSV for ML pipelines.
// Contributors
Each member owns a domain of 12 features, ensuring separation of concerns and expertise-driven feature engineering.
// Target Variables (Niranjan · Member 04)
Multi-factor regression target in MPa. Driven by stiffness, hardness, and porosity. Primary structural design metric.
Composite score 0–10. Weighted from onset temperature, kp rate, stability index, and CTE mismatch.
Predicted cycles to failure (integer). Driven by fracture toughness, CTE, and composite mismatch index.
// Data composition
Ten refractory material families anchored to literature values, spanning carbides, borides, and nitrides in both monolithic and composite forms.
| Material | Crystal | Tm (K) | ρ (g/cm³) | ve | ΔEN | Type |
|---|---|---|---|---|---|---|
| HfC | FCC | 3900 | 12.20 | 8 | 1.3 | carbide |
| ZrC | FCC | 3420 | 6.73 | 8 | 1.3 | carbide |
| TaC | FCC | 3880 | 14.30 | 9 | 1.1 | carbide |
| HfB₂ | HEX | 3380 | 10.50 | 6 | 0.9 | boride |
| ZrB₂ | HEX | 3245 | 6.09 | 6 | 0.9 | boride |
| TiC | FCC | 3160 | 4.93 | 8 | 1.5 | carbide |
| NbC | FCC | 3600 | 7.79 | 9 | 1.2 | carbide |
| HfN | FCC | 3385 | 13.80 | 9 | 1.6 | nitride |
| ZrN | FCC | 2980 | 7.09 | 9 | 1.6 | nitride |
| TaN | HEX | 3090 | 16.30 | 10 | 1.4 | nitride |
Monolithic + composite variants. Composites (index 50–99) include HfC-SiC, ZrB₂-SiC, TaC-HfC and 7 more multi-phase systems.
Monolithic + doped variants. Doped systems (index 150–199) include HfC:Y, ZrC:La, TaC:W and 7 more rare-earth doped compositions.
// Repository layout
IMI-Project-main/ ├── Aadi-Dev/ # Member 1 — Thermodynamic + Electronic │ ├── dataset.py # Generates F01–F12 │ ├── Aadi.xlsx # Output: 200 × 17 (meta + 12 features) │ └── UHTM_base_200.xlsx # Base reference copy ├── Krish/ # Member 2 — Mechanical + Thermal + Infrastructure │ ├── Intro.py # Branch onboarding message │ └── LAB EVALUATION/ │ ├── Krishh.py # Generates F13–F24 │ ├── Krishh.xlsx # Output: 200 × 17 │ └── Merge/ │ ├── Base.py # ★ Generates UHTM_base_200.xlsx │ ├── mergeAll.py # ★ Final merge of all 4 member files │ ├── AadiDev.xlsx # Member 1 snapshot │ ├── Krishh.xlsx # Member 2 snapshot │ ├── Niranjan.xlsx # Member 4 snapshot │ ├── Salan.xlsx # Member 3 snapshot │ └── UHTM_final_200x48.xlsx # ★ FINAL DATASET ├── Niranjan/ # Member 4 — Phase + ML + Targets │ ├── Niranjan.py # Generates F37–F48 + T1–T3 │ └── Niranjan.xlsx # Output: 200 × 20 ├── Salan/ # Member 3 — Oxidation + Microstructural │ └── lab evaluation/ │ ├── Salan.py # Generates F25–F36 │ └── Salan_member3.xlsx # Output: 200 × 17 ├── UHTM_final_200x48.csv # ★ ML-ready CSV export └── UHTM_final_200x48.xlsx # ★ Final annotated Excel
// Getting started
Run the pipeline in order. All scripts use numpy.random.seed(42) for full reproducibility.
1. Install dependencies
pip install pandas numpy openpyxl
2. Generate the base file (run once)
# From Krish/LAB EVALUATION/Merge/ python Base.py # → UHTM_base_200.xlsx (200 rows × 10 columns)
3. Each member generates their features independently
# Member 1 (Aadi) python Aadi-Dev/dataset.py # → Aadi.xlsx (F01–F12) # Member 2 (Krish) python Krish/LAB\ EVALUATION/Krishh.py # → Krishh.xlsx (F13–F24) # Member 3 (Salan) python Salan/lab\ evaluation/Salan.py # → Salan_member3.xlsx (F25–F36) # Member 4 (Niranjan) python Niranjan/Niranjan.py # → Niranjan.xlsx (F37–F48 + T1–T3)
4. Merge all outputs into the final dataset
# Place all member xlsx files in Merge/ directory, then: python Krish/LAB\ EVALUATION/Merge/mergeAll.py # → UHTM_final_200x48.xlsx (200 × 53: 5 meta + 48 features + 3 targets) # → Summary Stats sheet + Feature Legend sheet included