Synthetic & Open Data for iUS Tumor Segmentation

MSc Project Data Repository — Last updated 2026-03-23

This repository documents the data, intermediate results, and quality control outputs for three technical routes to train intraoperative ultrasound (iUS) brain tumor segmentation models without manual iUS annotations. Route A transfers MRI tumor labels to iUS space via rigid registration. Route B synthesises realistic iUS from MRI using MMHVAE (Dorent et al., 2025). Route A+B combines both approaches via two-stage training.

All data derives from the ReMIND dataset (Juvekar et al., 2023) — 114 patients, 41 GB DICOM, publicly available from TCIA. Of these, 62 are first-surgery patients suitable for registration; 110 have sufficient anatomy for virtual sweep simulation. The entire pipeline from raw DICOM to training-ready data was executed over 2026-03-16 to 2026-03-18 on a single RTX 2080 Ti machine.

Key Figures at a Glance

114
ReMIND patients (DICOM)
TCIA
73.7%
Registration validity (84/114)
V3 after QC exclusion (30 excluded)
44,502
Route A training slices
43 patients
1,090
Virtual sweeps (Route B)
110 cases × 10
1,878
MMHVAE input volumes
33.7 GB NIfTI
589
QC images (local)
549 mmhvae + 40 sweep

Table 1. Technical Route Comparison

The three routes differ in how training labels are obtained and whether the model sees real or synthetic iUS. Route A is the simplest baseline (direct label transfer with registration noise); Route B eliminates registration error by synthesising iUS from MRI; Route A+B leverages both real and synthetic data.

Route A (Baseline)Route B (Core Method)Route A+B (Innovation)
PipelineMRI → Rigid Reg → Pseudo-label → nnU-NetMRI → Virtual Sweep → MMHVAE → nnU-NetA pre-train → B fine-tune
Label qualityNoisy (3–5 mm registration error)Precise (zero registration error)Complementary
Training imagesReal iUSSynthetic iUSBoth
Data scale44,502 slices / 43 patients1,090 sweeps / 110 casesCombined
DSC reference0.58–0.62 (Faanes 2025)0.74 (Dorent 2025)Target: 0.84
StatusData ready; training blocked (GPU)Sweeps done; MMHVAE blocked (weights)Pending both routes
Detailed Pages: Route A — registration iteration data (V1→V3), convergence analysis, optimizer comparison, pseudo-label summary · Route B — virtual sweep statistics, MMHVAE input QC gallery (549 images), best-effort sweep QC (40 images) · Progress — pipeline status, blockers, DSC roadmap

Table 2. Data File Index

All quantitative results are stored as flat CSV or per-case JSON files. This index lists the primary data files referenced throughout the dashboard; each is linked to the relevant detail page where its contents are analysed.

FileRowsKey ColumnsRoute
registration_results_v1.csv204patient_id, translation_magnitude_mm, mi_improvement, stop_condition, statusA
registration_results_v2.csv114+ convergence, mask_coverageA
registration_results_v3.csv114+ convergence, mask_coverageA
ab_test_20260317_075447.csv70config (7 optimizers × 10 patients), converged, stop_classA
sweep_metadata.json × 110~10 sweeps eachtumor_diameter_mm, C2_dist_to_tumor, tumor_pixels, saved_slicesB
conversion_*.csv~1090case_id, modalities, volume_shape, seg_nonzeroB