Capstone Project · 2026

Extreme Low-Light
RAW Video Denoising

Removing sensor noise from 0.1 lux RAW video in the RAW domain — and verifying how the restored signal carries through to downstream vision tasks.

^● Sogang University · Department of Computer Science and Engineering

Code Paper (KCC 2026) Video Gallery

Noisy input RViDeNet-ECBAM

↔

scene_giraffe · 0.1 lux IMX327 RAW · drag the handle to wipe between noisy input and our denoised output

Overview

What we built

In low-light capture, the limited light reaching the sensor produces strong shot noise, read noise, and fixed-pattern noise. This degrades not only perceptual quality but also downstream tasks such as object detection.

Our core model is RViDeNet-ECBAM, a modified RViDeNet for RAW video denoising. It takes three consecutive noisy Bayer RAW frames and reconstructs the denoised RAW of the center frame, exploiting temporal information from neighbouring frames to suppress noise while preserving structure.

Rather than optimizing the sRGB output of an ISP, we deliberately focus on RAW-to-RAW restoration: fine-tuning is driven by a RAW reconstruction loss and a temporal consistency loss, without an sRGB loss. Our key architectural change replaces CBAM's 7×7 spatial attention with ECBAM (channel attention + Enhanced Spatial Attention), greatly enlarging the receptive field so the network can separate globally-distributed low-light noise from real structure.

We benchmark against Noisy input, VBM3D, and FastDVDNet under an identical RAW→PNG visualization, evaluating with PSNR, SSIM, LPIPS, tOF and tLPIPS, and additionally probe a YOLOv11x downstream detection proxy.

Headline results

From noise to signal, in the dark

In the RAW domain, RViDeNet-ECBAM substantially improves over the noisy input on both the self-captured 0.1 lux set and the external ReCRVD set.

45.0 → 57.1

Self-captured · RAW PSNR (dB)

21.8 → 39.3

ReCRVD · RAW PSNR (dB)

0.016 → 1.64

YOLOv11x · detections / frame

On the self-captured set (PNG domain) it reaches the best PSNR / SSIM / tOF among all compared methods; on ReCRVD it reaches the best LPIPS. In the YOLOv11x downstream proxy, the detected-frame ratio rises from 0.016 to 0.726 — the strongest of all methods. See the full evaluation →

Explore

Dive deeper

01 / Method

Model & training

The RViDeNet-ECBAM architecture, the ESA attention change, datasets, and the 3-stage training strategy.

Read the method →

02 / Results

Evaluation

RAW & PNG metrics, per-scene tables, alpha blending, and the YOLOv11x downstream study.

See the numbers →

03 / Gallery

Video comparisons

Noisy vs. denoised, side by side, across eight self-captured low-light scenes.

Watch the clips →