Deep Optics for Single-shot HDR Imaging | CVPR 2020

Christopher A. Metzler, Hayato Ikoma, Yifan (Evan) Peng, Gordon Wetzstein

End-to-end optimization of optics and image processing for HDR imaging.

ABSTRACT

High-dynamic-range (HDR) imaging is crucial for many computer graphics and vision applications. Yet, acquiring HDR images with a single shot remains a challenging problem. Whereas modern deep learning approaches are successful at hallucinating plausible HDR content from a single low-dynamic-range (LDR) image, saturated scene details often cannot be faithfully recovered. Inspired by recent deep optical imaging approaches, we interpret this problem as jointly training an optical encoder and electronic decoder where the encoder is parameterized by the point spread function (PSF) of the lens, the bottleneck is the sensor with a limited dynamic range, and the decoder is a convolutional neural network (CNN). The lens surface is then jointly optimized with the CNN in a training phase; we fabricate this optimized optical element and attach it as a hardware add-on to a conventional camera during inference. In extensive simulations and with a physical prototype, we demonstrate that this end-to-end deep optical imaging approach to single-shot HDR imaging outperforms both purely CNN-based approaches and other PSF engineering approaches.

FILES

Technical Paper (arxiv)
Presentation video (link)
Source code (github link)

CITATION

Metzler, C., Ikoma, H., Peng, Y., Wetzstein, G., Deep Optics for Single-shot High-dynamic-range Imaging, CVPR 2020

@inproceedings{Metzler:2020:DeepOpticsHDR,
author={C. Metzler and H. Ikoma and Y. Peng and G. Wetzstein},
title={Deep Optics for Single-shot High-dynamic-range Imaging},
booktitle={Proc. CVPR},
year={2020},
}

Illustration of the proposed end-to-end optimization framework. HDR images of a training set are convolved with the PSF created by a lens surface profile φ. These simulated measurements are clipped by a function f(.) to emulate sensor saturation and noise η is added. The resulting RGB image y is processed by a convolutional neural network (CNN) and its output compared with the ground truth HDR image using the loss function L described in the paper. In the learning stage, this loss is back-propagated into the CNN weights and bias values and also into the height values h of the lens. During inference, a captured LDR image blurred by the optical PSF is fed directly into the pre-trained CNN to compute the reconstructed HDR image.

Quantitative evaluation for the entire test set. Several single-shot HDR imaging approaches are compared using a perceptual image difference computed by HDR-VDP-2 and peak signal-to-noise ratio (PSNR) computed in the linear domain (L) and in the γ-corrected domain.

Optimized height profile of the diffractive optical element (DOE, left) along with profilometer measurements of the fabricated DOE. The DOE structure partially resembles that of a grating, which creates multiple peaks in the point spread function (PSF, right). Intuitively, this PSF creates three shifted and scaled copies of the input image. Although the measured PSF is slightly blurrier than the simulated PSF, likely due to imperfections in the fabrication process and approximations of our image formation model, their general shapes are comparable.

Conventional camera sensors are limited in their ability to capture high-dynamic-range (HDR) scenes. Details in brighter parts of the image, such as the light bulb, are saturated in a low-dynamic-range (LDR) photograph (left). We propose an end-to-end (E2E) approach to jointly optimizing a diffractive optical element and a convolutional neural network (CNN) to enable single-shot HDR imaging. After training optics and algorithm jointly, the lens is fabricated and attached to a conventional camera lens as an add-on. During inference, the proposed deep optical imaging system records a single sensor image (center left) that contains optically encoded HDR information, which helps the CNN recover an HDR image (center right). As compared to the conventional LDR image (insets, top), the HDR image computed by our system (insets, center) extends the dynamic range of the sensor significantly, and more closely resembles the reference HDR photograph of this scene (insets, bottom).

Related Projects

You may also be interested in related projects, where we apply the idea of Deep Optics, i.e. end-to-end optimization of optics and image processing, to other applications, like image classification, extended depth-of-field imaging, superresolution imaging, or optical computing.

Wetzstein et al. 2020. AI with Optics & Photonics. Nature (review paper, link)
Martel et al. 2020. Neural Sensors. ICCP & TPAMI 2020 (link)
Dun et al. 2020. Learned Diffractive Achromat. Optica 2020 (link)
Metzler et al. 2020. Deep Optics for HDR Imaging. CVPR 2020 (link)
Chang et al. 2019. Deep Optics for Depth Estimation and Object Detection. ICCV 2019 (link)
Peng et al. 2019. Large Field-of-view Imaging with Learned DOEs. SIGGRAPH Asia 2019 (link)
Chang et al. 2018. Hybrid Optical-Electronic Convolutional Neural Networks with Optimized Diffractive Optics for Image Classification. Scientific Reports (link)
Sitzmann et al. 2018. End-to-end Optimization of Optics and Imaging Processing for Achromatic Extended Depth-of-field and Super-resolution Imaging. ACM SIGGRAPH 2018 (link)