Abstract

Image processing and pixel-wise dense prediction have been advanced by harnessing the capabilities of deep learning. One central issue of deep learning is the limited capacity to handle joint upsampling. We present a deep learning building block for joint upsampling, namely guided filtering layer. This layer aims at efficiently generating the high-resolution output given the corresponding low-resolution one and a high-resolution guidance map. The proposed layer is composed of a guided filter, which is reformulated as a fully differentiable block. To this end, we show that a guided filter can be expressed as a group of spatial varying linear transformation matrices. This layer could be integrated with the convolutional neural networks (CNNs) and jointly optimized through end-to-end training. To further take advantage of end-to-end training, we plug in a trainable transformation function that generates task-specific guidance maps. By integrating the CNNs and the proposed layer, we form deep guided filtering networks. The proposed networks are evaluated on five advanced image processing tasks. Experiments on MIT-Adobe FiveK Dataset demonstrate that the proposed approach runs 10-100 times faster and achieves the state-of-the-art performance. We also show that the proposed guided filtering layer helps to improve the performance of multiple pixel-wise dense prediction tasks.
Accepted by CVPR 2018

Code and Extras

You can find the code on Github, including:
  • Training/Test source code (PyTorch)
  • Pretrained models and datasets
  • Step-by-step tutorial to run our algorithm

Bibtex

@inproceedings{wu2017fast,
  title     = {Fast End-to-End Trainable Guided Filter},
  author    = {Wu, Huikai and Zheng, Shuai and Zhang, Junge and Huang, Kaiqi},
  booktitle = {CVPR},
  year = {2018}
}

DEMO

It should finish within 15s. If not, delete and run it again!

VISUAL RESULTS

Visual Results --- Mono Depth Estimation (Click to Enlarge)

Input

GroundTruth

Ours

Baseline

Visual Results --- Semantic Segmentation (Click to Enlarge)

Input

GroundTruth

Ours

Baseline

Visual Results --- Saliency Detection (Click to Enlarge)

Input

GroundTruth

Ours

Baseline