Removing sensor noise from 0.1 lux RAW video in the RAW domain — and verifying how the restored signal carries through to downstream vision tasks.
● Sogang University · Department of Computer Science and Engineering
scene_giraffe · 0.1 lux IMX327 RAW · drag the handle to wipe between noisy input and our denoised output
In low-light capture, the limited light reaching the sensor produces strong shot noise, read noise, and fixed-pattern noise. This degrades not only perceptual quality but also downstream tasks such as object detection.
Our core model is RViDeNet-ECBAM, a modified RViDeNet for RAW video denoising. It takes three consecutive noisy Bayer RAW frames and reconstructs the denoised RAW of the center frame, exploiting temporal information from neighbouring frames to suppress noise while preserving structure.
Rather than optimizing the sRGB output of an ISP, we deliberately focus on RAW-to-RAW restoration: fine-tuning is driven by a RAW reconstruction loss and a temporal consistency loss, without an sRGB loss. Our key architectural change replaces CBAM's 7×7 spatial attention with ECBAM (channel attention + Enhanced Spatial Attention), greatly enlarging the receptive field so the network can separate globally-distributed low-light noise from real structure.
We benchmark against Noisy input, VBM3D, and FastDVDNet under an identical RAW→PNG visualization, evaluating with PSNR, SSIM, LPIPS, tOF and tLPIPS, and additionally probe a YOLOv11x downstream detection proxy.
In the RAW domain, RViDeNet-ECBAM substantially improves over the noisy input on both the self-captured 0.1 lux set and the external ReCRVD set.
On the self-captured set (PNG domain) it reaches the best PSNR / SSIM / tOF among all compared methods; on ReCRVD it reaches the best LPIPS. In the YOLOv11x downstream proxy, the detected-frame ratio rises from 0.016 to 0.726 — the strongest of all methods. See the full evaluation →