Arbitrary Shape Text Detection via Boundary Transformer. (arXiv:2205.05320v4 [cs.CV] UPDATED)

In arbitrary shape text detection, locating accurate text boundaries is
challenging and non-trivial. Existing methods often suffer from indirect text
boundary modeling or complex post-processing. In this paper, we systematically
present a unified coarse-to-fine framework via boundary learning for arbitrary
shape text detection, which can accurately and efficiently locate text
boundaries without post-processing. In our method, we explicitly model the text
boundary via an innovative iterative boundary transformer in a coarse-to-fine
manner. In this way, our method can directly gain accurate text boundaries and
abandon complex post-processing to improve efficiency. Specifically, our method
mainly consists of a feature extraction backbone, a boundary proposal module,
and an iteratively optimized boundary transformer module. The boundary proposal
module consisting of multi-layer dilated convolutions will compute important
prior information (including classification map, distance field, and direction
field) for generating coarse boundary proposals while guiding the boundary
transformer’s optimization. The boundary transformer module adopts an
encoder-decoder structure, in which the encoder is constructed by multi-layer
transformer blocks with residual connection while the decoder is a simple
multi-layer perceptron network (MLP). Under the guidance of prior information,
the boundary transformer module will gradually refine the coarse boundary
proposals via iterative boundary deformation. Furthermore, we propose a novel
boundary energy loss (BEL) which introduces an energy minimization constraint
and an energy monotonically decreasing constraint to further optimize and
stabilize the learning of boundary refinement. Extensive experiments on
publicly available and challenging datasets demonstrate the state-of-the-art
performance and promising efficiency of our method.

DoctorMorDi

DoctorMorDi

Moderator and Editor