NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

CVPR 2022


Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

Alibaba Group   

Abstract



Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed. While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization. Due to the expensive computation, CRFs are usually performed between neighborhoods rather than the whole graph. To leverage the potential of fully-connected CRFs, we split the input into windows and perform the FC-CRFs optimization within each window, which reduces the computation complexity and makes FC-CRFs feasible. To better capture the relationships between nodes in the graph, we exploit the multi-head attention mechanism to compute a multi-head potential function, which is fed to the networks to output an optimized depth map. Then we build a bottom-up-top-down structure, where this neural window FC-CRFs module serves as the decoder, and a vision transformer serves as the encoder. The experiments demonstrate that our method significantly improves the performance across all metrics on both the KITTI and NYUv2 datasets, compared to previous methods. Furthermore, the proposed method can be directly applied to panorama images and outperforms all previous panorama methods on the MatterPort3D dataset.



Demo videos



Neural Window FC-CRFs


NeW CRFs

The neural window fully-connected CRFs take image feature $\mathcal{F}$ and upper-level prediction $X$ as input, and compute the fully-connected energy $E$ in each window, which is then fed to the networks to output an optimized depth map.


Network structure


NeW CRFs

Network structure of the proposed framework. The encoder first extracts the features in four levels. A PPM head aggregates the global and local information and makes the initial prediction $X$ from the top image feature $\mathcal{F}$. Then in each level, the neural window fully-connected CRFs module builds multi-head energy from $X$ and $\mathcal{F}$, and optimizes it to a better prediction $X'$. Between each level a rearrange upscale is performed considering the sharpness and network weight.


Point cloud visualization








Citation


@inproceedings{yuan2022newcrfs,
  title={NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation},
  author={Yuan, Weihao and Gu, Xiaodong and Dai, Zuozhuo and Zhu, Siyu and Tan, Ping},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={},
  year={2022}
}