Route A — MRI-to-iUS Registration & Pseudo-label Generation

Pipeline: MRI tumor annotation → rigid registration to iUS space → pseudo-label generation → nnU-Net training

Registration was iterated three times (V1 → V2 → V3), with each version addressing specific failure modes identified in the prior run. The final dataset includes rescue passes for initially non-converging cases.

§1 Registration Iteration Summary

Table 1. Per-version registration statistics

Version	n	Patients	Converged	Rate	Trans. median	Trans. IQR	Trans. range	Rot. median	MI impr. median	Time median	Key change
V1	204	All (incl. repeat surgery)	14	6.9%	14.8 mm	11.1–18.5	0.5–28.2	5.5°	0.0445	19.9 s	Baseline: 1k iter, no mask
V2	114	First-surgery only	64	56.1%	7.0 mm	4.5–12.4	0.5–38.9	7.3°	0.0231	27.2 s	+brain mask, 3k iter, 5 mm dil.
V3	114	First-surgery only	84	73.7%	6.2 mm	4.1–8.8	0.4–32.3	6.2°	0.0229	18.6 s	2 mm dil. (A/B tested vs V2)
Final	114	After QC exclusion	84	73.7%	—			—	—	—	30 timepoints excluded (metric + visual QC)

Convergence criterion: Optimizer step size fell below 1×10⁻⁵ ("step too small" stop condition in SimpleITK RegularStepGradientDescentOptimizerv4). Non-converged = max iterations reached.
V1 → V2 changes: (1) restricted to first-surgery patients (n reduced from 204 to 114); (2) added brain mask to focus MI computation; (3) increased max iterations from 1,000 to 3,000; (4) added 5 mm mask dilation.
V2 → V3 change: Reduced mask dilation from 5 mm to 2 mm. All other parameters held constant.

Fig. 1. Convergence rate progression

Fig. 1a (left): Optimizer convergence rate across registration versions. V1 baseline (all patients, 1k iterations, no mask) achieved only 6.9%. Adding brain mask and increasing iterations (V2) raised it to 56.1%. Reducing dilation from 5→2 mm (V3) reached 77.2%. After visual QC review, 30 timepoints were excluded (metric failures + clearly wrong pseudo-labels), yielding 84/114 valid (73.7%). Fig. 1b (right): Translation magnitude distribution (Q1 / median / Q3) per version. Lower values indicate tighter MRI-iUS alignment. V3 shows both lower median (6.2 mm vs 14.8 mm) and tighter IQR (4.7 mm span vs 7.4 mm).

§2 Mask Dilation Comparison: V2 (5 mm) vs V3 (2 mm)

V2 and V3 used identical patients (n=114), optimizer, and iteration limits. The only changed parameter was mask dilation radius. This constitutes a controlled single-variable comparison.

Table 2. Paired comparison (same 114 patients)

Metric	V2 (5 mm dilation)	V3 (2 mm dilation)	Δ
Convergence rate	56.1% (64/114)	73.7% (84/114)	+17.6 pp
Translation median	7.0 mm	6.2 mm	−0.8 mm
Translation IQR	4.5–12.4 mm	4.1–8.8 mm	Tighter
MI improvement median	0.0231	0.0229	≈0 (n.s.)
Median runtime	27.2 s	18.6 s	−8.6 s

Interpretation: The tighter 2 mm dilation kept the registration focused on the tumor boundary region. The 5 mm dilation included excessive surrounding tissue, causing the optimizer to get stuck in local minima (higher non-convergence). MI improvement was similar for converged cases, suggesting alignment quality is comparable — the key gain is in convergence rate.
Note on p-value: A paired Wilcoxon signed-rank test on MI improvement yielded p ≈ 0.007 (reported from registration script logs; see route_a/logs/). This should be independently verified.

§3 Optimizer Configuration Comparison (A/B Test)

A separate experiment tested 7 optimizer configurations on 10 patients to evaluate whether optimizer choice alone could improve convergence.

Table 3. Optimizer A/B test results (ab_test_20260317_075447.csv)

Config	Optimizer	Max iter	Mask	n
baseline	RSGD	1,000	No	10
rsgd_3k	RSGD	3,000	No	10
rsgd_5k	RSGD	5,000	No	10
lbfgs2	L-BFGS-B	—	No	10
cgls	Conj. Grad.	—	No	10
baseline_mask	RSGD	1,000	Yes	10
rsgd_3k_mask	RSGD	3,000	Yes	10

Finding: No configuration achieved convergence on the 10 test patients. This motivated the shift from optimizer tuning to registration region tuning (mask dilation), which proved effective (§2). The test patients may have been particularly difficult cases; the 0% rate across all configs confirms that optimizer choice alone is insufficient — the brain mask + dilation approach was the critical factor.

§4 Pseudo-label Generation & Training Data

Table 4. Data generation pipeline summary

Stage	Input	Output	Count
Patient filtering	114 ReMIND patients	First-surgery subset	62 patients
Registration V3	62 patients (114 timepoints)	Converged registrations	84 / 114 converged (73.7%)
QC exclusion	114 timepoints	Quality-filtered registrations	84 valid (30 excluded: 8 metric + 22 visual QC)
Pseudo-label transfer	84 valid registrations	MRI labels in iUS space	84 with non-zero labels
2D slicing	84 pseudo-labeled volumes	Training-ready 2D slices	44,502 slices
nnU-Net formatting	44,502 slices	Dataset001_iUS	43 patients, preprocessed

Note: 62 first-surgery patients yielded 114 registration runs in V2/V3 because most patients have two iUS timepoints (pre-dura and post-dura). After visual QC review (note 9, note 12), 30 timepoints were excluded: 8 for metric failures (excessive translation/rotation, zero pseudo-label voxels) and 22 for clearly incorrect pseudo-label placement despite passing automated metrics. The remaining 84 valid registrations from 43 unique patients produced 44,502 training slices after 200 mm² area filtering.
Patient count: Our first-surgery filter identified 62 patients vs 55 in Faanes 2025. The 7-patient difference is likely due to additional exclusions for missing MRI annotations (n=1, ReMIND-101) and iUS quality issues (n=5–6).
Interpolation method: Pseudo-labels use linear interpolation with 0.5 threshold binarization (instead of nearest-neighbor) to reduce staircase artifacts. This reduced average connected components from 5.2 to 2.8 per volume.

§5 Quality Control Galleries

Fig. 2. Registration QC thumbnails (V3 final — 114 images)

Each thumbnail shows the registered MRI (color overlay) aligned to iUS (grayscale background) after rigid registration. Filter by timepoint (pre-dura / post-dura) below.

Fig. 2: Registration QC from V3 final run. 114 images covering 61 patients (most have both pre-dura and post-dura timepoints). V1 and V2 QC images were not retained; iterative improvement is quantified in Table 1.

Fig. 3. Pseudo-label overlay triptychs (36 images across 6 patients)

Three-panel views showing (left) original iUS, (center) binary pseudo-label mask, (right) overlay. Each patient has 6 representative slices (2 per anatomical plane). Filter by plane below.

Fig. 3: Pseudo-label QC showing spatial alignment quality. Slice numbers correspond to the 2D slice index in the nnU-Net training dataset. Red overlay = transferred MRI tumor label.