Route B — Virtual Ultrasound Sweep Simulation & MMHVAE Input Preparation

Pipeline: MRI volumes → virtual US probe sweep simulation → 2D slice extraction → 3D NIfTI stacking → MMHVAE synthesis

Virtual sweeps simulate a linear ultrasound probe trajectory across the brain, extracting 2D oblique slices from MRI volumes at each probe position. The sweep parameters (entry point, angle, depth) are randomised per case to generate 10 diverse sweeps. Output slices are then stacked into 3D NIfTI volumes compatible with MMHVAE input format.

Data availability: 110 of 114 ReMIND patients were processed. Four cases (ReMIND-009, ReMIND-024, ReMIND-051, ReMIND-101) lack MRI tumor segmentation annotations and were excluded from virtual sweep generation, as tumor centroid is required to define sweep trajectory.

§1 Virtual Sweep Generation Statistics

Table 1. Sweep generation summary (from 110 sweep_metadata.json files)

Metric	Value
Cases processed	110 (of 114 ReMIND patients)
Sweeps per case	10 (fixed)
Total sweeps	1,090
Tumor diameter — range	4.9 – 116.8 mm
Tumor diameter — median (mean)	50.3 mm (49.5 mm)
C₂ probe-to-tumor distance — range	23.5 – 143.8 mm
C₂ probe-to-tumor distance — median (mean)	60.0 mm (62.4 mm)
Slices per sweep — range	75 – 292
Slices per sweep — median (mean)	194 (190)

Fig. 1. Sweep parameter distributions

Fig. 1a (left): Tumor diameter distribution across 110 cases. Bins are 10 mm wide. The dataset covers a wide range from very small tumors (4.9 mm) to large ones (116.8 mm), with median 50.3 mm. Cases with tumors <20 mm are more likely to produce best-effort sweeps (see §3). Fig. 1b (right): Probe-to-tumor distance (C₂ metric) across all 1,090 sweeps. The majority of sweeps (843/1090 = 77.3%) have C₂ distance between 40–80 mm. Larger distances correlate with fewer tumor pixels per sweep.

Data source: Statistics extracted from sweep_metadata.json files (one per case, located in each case subdirectory under virtual_sweeps/full_run/). Each JSON contains per-sweep records with fields: sweep_idx, tumor_pixels, C2_dist_to_tumor, saved_slices, tumor_diameter_mm (case-level).
Version: v3 (full_run) — includes deep tumor coverage fix.

§2 2D→3D Volume Conversion (MMHVAE Input)

Table 2. Conversion summary

Item	Value
Input	1,090 sweep directories (2D NIfTI slices per sweep)
Output (MRI volumes)	1,878 3D NIfTI files in `mmhvae_input/`
Output (segmentation)	1,090 3D NIfTI files in `mmhvae_seg/`
Total size	33.7 GB
Resolution per slice	192 × 192
MRI modalities converted	T2, ceT1, FLAIR (per sweep)
Naming convention	`ReMIND{NNN}s{SS}-{mod}.nii.gz` (e.g., ReMIND001s00-t2.nii.gz)
QC images generated	549 (first 5 sweeps per case, QC_MAX_SWEEPS=5)

Conversion script: sweep_to_mmhvae.py — stacks 2D NIfTI slices into 3D volumes, renames to MMHVAE-compatible format (strip dashes from case name to avoid collision with modality separator "-").
QC image generation: generate_qc_image() in the same script. Each QC image shows the middle axial slice (z = D/2) of the stacked volume for the first available MRI modality (T2 preferred), with red contour + tint overlay on tumor segmentation where non-zero at that slice. Only first 5 sweeps per case get QC images (QC_MAX_SWEEPS=5), explaining 549 images for 110 cases (not 1,090).

MMHVAE Technical Specifications

Parameter	Value
Network architecture	MHVAE2D (purely 2D, 14M parameters)
Latent hierarchy	6 layers (coarse to fine)
Input format	3D NIfTI volume (192×192×D), reshaped to (B×D, 1, 192, 192) for 2D processing
Normalization	Min-max [0, 99.95%] percentile, background = -1
Training regime	1000 epochs, Adamax lr=2e-3, batch_size=16 (2D slices)
GAN discriminator	Activated at epoch 790
Modalities	us, t2, cet1, flair (4-modality cross-synthesis)
Pre-trained weights	Required but not publicly available (contact author)

Source: MMHVAE repository (github.com/ReubenDo/MMHVAE) and IEEE TPAMI 2025 paper (Dorent et al.). Training from scratch requires ~5–10 days on RTX 2080 Ti.

Fig. 2. MMHVAE Input Volume QC — 549 images (110 cases × ≤5 sweeps)

Each 192×192 JPEG shows the middle slice of a converted 3D MRI volume. Red contour/tint = tumor segmentation overlay (absent when tumor is not present at the middle slice). Select cases below to browse.

Fig. 2: QC output from sweep_to_mmhvae.py. 549 images across 110 cases (max 5 sweeps per case). Title text shows case ID and z-index of displayed slice. Images with visible red overlay indicate tumor segmentation is present at the middle slice; images without red overlay indicate the tumor is located at a different z-level or the sweep has zero tumor coverage. Source: mmhvae_qc/ on remote machine, transferred via tar -czf.

§3 Best-Effort Sweep QC — Cases with Deep/Small Tumors

For cases where the tumor is deep or small, some virtual sweeps cannot intersect the tumor volume regardless of probe placement. These "best-effort" (BE) sweeps contain brain tissue only (zero tumor pixels). This gallery examines 4 such cases to visualise the difference between sweeps that hit the tumor (OK) and those that don't (BE).

Table 3. Best-effort case summary

Case	OK sweeps	BE sweeps	BE ratio
ReMIND-005	6	4	40%
ReMIND-043	2	8	80%
ReMIND-051	2	8	80%
ReMIND-102	2	8	80%
Total	12	28	70%

Fig. 3. Sweep triptychs — 4 best-effort cases (40 images)

Each image shows 3 frames (25%% / 50%% / 75%% position) from one sweep. Green bar = OK (tumor visible as red contour). Red bar = BEST-EFFORT (no tumor, brain tissue only).

Fig. 3: Generated by sweep_qc_gallery.py. Triptych format: 3 representative frames per sweep at 25%%, 50%%, 75%% positions along the sweep trajectory. OK sweeps show red tumor contour/tint; BE sweeps show only greyscale brain tissue. Compare OK vs BE within the same case to visualise coverage differences. Source: sweep_qc_images/.

Best-effort strategy: Rather than discarding cases with deep/small tumors, the sweep generator places the probe as close to the tumor centroid as anatomically possible. Even when no sweep intersects the tumor, the resulting brain-tissue-only sweeps contribute to MMHVAE training by teaching the model normal brain appearance. The BE ratio across all 110 cases (not just these 4) is lower; these 4 cases were selected specifically because they have the highest BE ratios.