EndoGeDE

Generalizable Self-supervised Monocular Depth Estimation in Diverse Endoscopic Scenes

Duration: 2024.10-Now

Related Publications: IROS’25 Oral (Shao et al., 2025), EAAI (Submitted)

Funded by: National Natural Science Foundation of China (No.82472116), Natural Science Foundation of Shanghai (No.24ZR1404100)

Brief Background

Self-supervised monocular depth estimation is a significant task for low-cost and efficient three-dimensional scene perception in endoscopy.

The variety of illumination conditions and scene features is still the primary challenge for generalizable depth estimation in endoscopic scenes.

Our Work

In this project, several topics listed below are explored:

  • The parameter-efficient finetuning for generalizable foundation model.
  • How to tackle the inconsistency and diversity of endoscopic scenes?
  • The application of the framework in ego-motion estimation and 3D reconstruction.
Challenges (left) and performance comparison (right) for depth estimation in diverse endoscopic scenes. The size of the circle denotes the number of trainable parameters in the corresponding depth estimation network.

We propose:

  • A novel self-supervised framework for depth estimation is proposed to jointly deal with the inconsistency of brightness and reflectance in endoscopic scenes
  • A novel parameter-efficient finetuning method with block-wise mixture of dynamic low-rank experts is proposed to adaptively utilize the low-rank experts for depth estimation based on diverse features
Left: The proposed multi-stage self-supervised framework. Right: The proposed parameter-efficient finetuning method.

Achievements

  • The proposed method achieves state-of-the-art(SOTA) performance for self-supervised depth estimation on SCARED and SimCol dataset, as well as zero-shot depth estimation on C3VD, Hamlyn and SERV-CT dataset, compared with existing works.
  • The proposed method is applied to sim-to-real test on EndoMapper dataset.
  • The proposed method is compared with SOTA methods for 3D Reconstruction on SCARED dataset.
  • Publications: IROS’25 Oral (Shao et al., 2025), EAAI (Submitted)
Demo Videos of Depth Estimation in Diverse Endoscopy.
Comparison of 3D Reconstruction on Samples from SCARED dataset.
Top: Visualization of the experts' weights based on t-SNE. Bottom: Qualitative comparison of results with different SSL pipelines.

References

2025

  1. IROS 2025
    iros2025.png
    EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training
    Liangjing Shao , Linxin Bai , Chenkang Du , and Xinrong Chen#
    In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025