RTRS: Building a Robust Text Reading System for Unconstrained Scene Images


ICDAR’17 tutorial on November 10 - 15, 2017, Kyoto


Reading unconstrained scene text in the wild has already attracted more and more attention in the field of computer vision. As many real-world applications can benefit from the rich semantic information embedded in the scene text such as image retrieval and self- driving car, huge efforts have been put into building up a robust reading system for unconstrained scene text. Usually, a reading sys- tem for scene text can be divided into two parts: scene text detector and recogniser.

In this tutorial, we first analyze the challenges on text reading for unconstrained scene images. Then, some recent progress and important results which include text detection and recognition are concisely retrospected. Comprehensive evaluations and comparisons among these methods will also be covered. Previous works for text detection can be coarsely grouped into three categories: connected component based, sliding window based and text-line proposal based. And text recognition methods have made a significant breakthrough, since RNN is adopted to make sequence-to-sequence prediction.

In addition, we will introduce how to construct a complete reading system for unconstrained scene text. An enterprise-level scene text reading system (Baidu-OCR) will be proposed. Key components in our solutions will be discussed in detail. To illustrate the effectiveness of the reading system, several real-world applications will also be introduced.


This tutorial will include the following:

1. Unconstrained text detection in scene images

  Review of previous scene text detection methods 

  Review of general public datasets and evaluation measurements

  Comparisons and discussion of existing methods

2. A unified text detection system in natural scene images

  Exploiting word annotations for text character detection

  Text-line generation by utilizing the min-cost flow network model

  The representation of text-line

3. Unconstrained text recognition in scene images

  Review of previous scene text recognition methods

  Review of general public datasets for scene text recognition

  Comparisons and discussion of existing methods

4. Double attention residue network for scene text Recognition

  Effective training of spatial attention network for scene text recogniser

  A powerful deep extractor of spatial text-related features

  An RNN-based attention model for translating sequential 

5. Enterprise-level scene text reading system and its applications

  Enterprise-level reading system for unconstrained scene text

  Real-world applications of the reading system

6. Conclusion and future work




Zhizhong Su received his M.S. degree at Indiana University in 2014 and received his B.S. degree at Shanghai Jiao Tong University in 2012. He joined Baidu Institute of Deep Learning as a Research Engineer in 2014. His main research interests are image segmentation, OCR and sequence learning. He has been maintaining the Baidu Scene Text Recognition system since 2014 and has upgraded the recognition model with state-of-art deep learning algorithm multiple times, making the system very competitive in the industry.

Chengquan Zhang received the M.S. degree in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China in 2016. His research interests include document analysis, scene text detection and text reading system. During the period of Master, his works about text reading have been presented at ICDAR15 (oral), ICPR16, CVPR16, etc. In the summer of 2016, he joined the team of OCR at the Lab of IDL, Baidu. His main work is not only to maintain an efficient text detection system but also to explore new techniques of text detection in unconstrained images.