Denoising Diffusion Semantic Segmentation with Mask Prior Modeling. (arXiv:2306.01721v2 [cs.CV] UPDATED)


Denoising Diffusion Semantic Segmentation with Mask Prior Modeling. (arXiv:2306.01721v2 [cs.CV] UPDATED)
By: <a href="http://arxiv.org/find/cs/1/au:+Lai_Z/0/1/0/all/0/1">Zeqiang Lai</a>, <a href="http://arxiv.org/find/cs/1/au:+Duan_Y/0/1/0/all/0/1">Yuchen Duan</a>, <a href="http://arxiv.org/find/cs/1/au:+Dai_J/0/1/0/all/0/1">Jifeng Dai</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_Z/0/1/0/all/0/1">Ziheng Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Fu_Y/0/1/0/all/0/1">Ying Fu</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_H/0/1/0/all/0/1">Hongsheng Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Qiao_Y/0/1/0/all/0/1">Yu Qiao</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_W/0/1/0/all/0/1">Wenhai Wang</a> Posted: June 23, 2023

The evolution of semantic segmentation has long been dominated by learning
more discriminative image representations for classifying each pixel. Despite
the prominent advancements, the priors of segmentation masks themselves, e.g.,
geometric and semantic constraints, are still under-explored. In this paper, we
propose to ameliorate the semantic segmentation quality of existing
discriminative approaches with a mask prior modeled by a recently-developed
denoising diffusion generative model. Beginning with a unified architecture
that adapts diffusion models for mask prior modeling, we focus this work on a
specific instantiation with discrete diffusion and identify a variety of key
design choices for its successful application. Our exploratory analysis
revealed several important findings, including: (1) a simple integration of
diffusion models into semantic segmentation is not sufficient, and a
poorly-designed diffusion process might lead to degradation in segmentation
performance; (2) during the training, the object to which noise is added is
more important than the type of noise; (3) during the inference, the strict
diffusion denoising scheme may not be essential and can be relaxed to a simpler
scheme that even works better. We evaluate the proposed prior modeling with
several off-the-shelf segmentors, and our experimental results on ADE20K and
Cityscapes demonstrate that our approach could achieve competitively
quantitative performance and more appealing visual quality.

Provided by:
http://arxiv.org/icons/sfx.gif

DoctorMorDi

DoctorMorDi

Moderator and Editor