SPADnet | Optics Express 2020

Zhanghao Sun, David B. Lindell, Olav Solgaard, Gordon Wetzstein

A neural sensor fusion framework for robust 3D imaging with single-photon detectors.

ABSTRACT

Single-photon light detection and ranging (LiDAR) techniques use emerging single-photon detectors (SPADs) to push 3D imaging capabilities to unprecedented ranges. However, it remains challenging to robustly estimate scene depth from the noisy and otherwise corrupted measurements recorded by a SPAD. Here, we propose a deep sensor fusion strategy that combines corrupted SPAD data and a conventional 2D image to estimate the depth of a scene. Our primary contribution is a neural network architecture—SPADnet—that uses a monocular depth estimation algorithm together with a SPAD denoising and sensor fusion strategy. This architecture, together with several techniques in network training, achieves state-of-the-art results for RGB-SPAD fusion with simulated and captured data. Moreover, SPADnet is more computationally efficient than previous RGB-SPAD fusion networks.

FILES

CITATION

Zhanghao Sun, David B. Lindell, Olav Solgaard, Gordon Wetzstein, SPADnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation, Optics Express Vol. 28, Issue 10, pp. 14948-14962 (2020)

@article{Sun:2020:SPADNet,
author={Z. Sun and D. Lindell and O. Solgaard and G. Wetzstein},
title={SPADnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation},
journal={Optics Express},
volume={28},
issue={10},
pages={14948–14962},
year={2020},
}

SPADnet architecture. We use a monocular depth estimator to convert the 2D image into a rough depth map and then conduct 2D-3D up-projection to fuse it with 3D features extracted from SPAD measurement. This monocular depth estimation and fusion strategy significantly improve model performance and robustness.
Qualitative and quantitative results comparing various approaches with a fixed signal-background ratio at 0.04 (2 signal photons vs. 50 background photons). SPADnet achieves the lowest RMSE error. White boxes in figures mark out regions with an extremely weak signal return, due to low object reflectivity or large detection distance. SPADnet has more stable prediction in these regions.
Evaluation of different algorithms on captured data. The “stuff” scene has higher signal-background ratio (SBR) and all methods are comparatively good under this condition. In the “kitchen” and “hallway” scenes, the SBR is much lower. Also, monocular depth estimation can fail in regions without enough monocular depth cue (“stuff” scene). SPADnet is able to fuse these two sources of information and outperforms other methods in more complicated real-world environment.

Related Projects

You may also be interested in related projects, where we have developed non-line-of-sight imaging systems:

  • Metzler et al. 2021. Keyhole Imaging. IEEE Trans. Computational Imaging (link)
  • Lindell et al. 2020. Confocal Diffuse Tomography. Nature Communications (link)
  • Young et al. 2020. Non-line-of-sight Surface Reconstruction using the Directional Light-cone Transform. CVPR (link)
  • Lindell et al. 2019. Wave-based Non-line-of-sight Imaging using Fast f-k Migration. ACM SIGGRAPH (link)
  • Heide et al. 2019. Non-line-of-sight Imaging with Partial Occluders and Surface Normals. ACM Transactions on Graphics (presented at SIGGRAPH) (link)
  • Lindell et al. 2019. Acoustic Non-line-of-sight Imaging. CVPR (link)
  • O’Toole et al. 2018. Confocal Non-line-of-sight Imaging based on the Light-cone Transform. Nature (link)

and direct line-of-sight or transient imaging systems:

  • Bergman et al. 2020. Deep Adaptive LiDAR: End-to-end Optimization of Sampling and Depth Completion at Low Sampling Rates. ICCP (link)
  • Nishimura et al. 2020. 3D Imaging with an RGB camera and a single SPAD. ECCV (link)
  • Heide et al. 2019. Sub-picosecond photon-efficient 3D imaging using single-photon sensors. Scientific Reports (link)
  • Lindell et al. 2018. Single-Photon 3D Imaging with Deep Sensor Fusions. ACM SIGGRAPH (link)
  • O’Toole et al. 2017. Reconstructing Transient Images from Single-Photon Sensors. CVPR (link)