SonoGuide
Deep segmentation for guiding surgical tools in 3D ultrasound
Highlights
Accurate localization of catheters in 3D ultrasound during minimally invasive surgery was hindered by speckle noise, low resolution, and complex instrument geometry.
Develop a deep learning solution (V-net) to segment surgical tools in noisy ultrasound volumes for real-time surgical navigation.
Designed a fully 3D neural network with dense skip-connections, dynamic training strategy, and test-time augmentation to boost segmentation accuracy and robustness.
Achieved 93.6% Dice score – 10% higher than U-net – and enabled precise, automated catheter detection in clinical ultrasound workflows.
Core Team
Overview
Minimally invasive cardiac procedures demand real-time, high-fidelity imaging of surgical tools like catheters. However, 3D echocardiography – despite being a cost-effective modality – suffers from speckle noise and low resolution, making device localization challenging.
This project addressed this limitation by designing a custom deep learning solution based on a modified U-net, termed V-net, tailored for medical image segmentation under noisy conditions. The proposed V-net introduces additional dense skip-connections within both encoder and decoder paths to combat gradient vanishing and overfitting.
Data
This study used a specialized dataset consisting of 75 three-dimensional grayscale ultrasound volumes collected during minimally invasive cardiac procedures on three Yorkshire pigs. The imaging was performed at Boston Children's Hospital using a Philips iE33 ultrasound system.
- •Resolution: 176×176×208 voxels per volume
- •Challenges: Speckle noise and low contrast posing significant segmentation difficulties
- •Augmentation: Synthetic data generation using kinematics of flexible robots
The catheter used in this study is shown in Figure 1, while Figure 2 illustrates the challenging visibility of catheters in raw ultrasound data.


Methods
To address the challenges of catheter segmentation in noisy 3D ultrasound data, we developed V-net, a fully volumetric convolutional neural network based on the U-net architecture, enhanced for depth, gradient stability, and feature reuse.
- Architecture: The V-net employs a symmetric encoder-decoder structure with dense skip-connections, dilated convolutions, instance normalization, ELU activations, and dropout.
- Training Strategy: Cyclical learning rate schedule and variable batch size strategy based on the Fibonacci sequence.
- Hyperparameter Tuning: T-test-based selection method reducing the search space from millions to a few hundred combinations.
- Inference: Test-time augmentation (TTA) to enhance segmentation stability.
The full V-net architecture is shown in Figure 3, and the dense feature transfer mechanism is illustrated in Figure 4.


Results
The V-net architecture achieved high spatial precision in localizing catheters despite strong speckle noise and anatomical occlusion:
- V-net (TTA): 93.6 ± 2.4% DSC
- Standard U-net: 80.5 ± 5.8% DSC
- SegNet / FCN: <25% DSC on average
Segmentation accuracy on synthetic and real samples is shown in Figure 5 and Figure 6, and the comparison with other networks is presented in Figure 7.



The videos below demonstrate 3D volumetric segmentation in static and dynamic series:
Conclusion
This project showcases a robust, deployable solution for segmenting surgical instruments in noisy 3D ultrasound, significantly enhancing intraoperative navigation. The V-net model's innovations in architecture, training strategy, and data synthesis demonstrate both academic rigor and practical value.
Future work may include adapting the model for other low-resolution modalities, extending real-time capabilities, and integrating AR-based visualization during surgery. The methods are also applicable to oncology, neurosurgery, and vascular interventions.



