When evaluated on the ”Robust Reading Competition” dataset for natural scene images, our method achieved better detection results compared to state-of-the-art methods. In addition to its efﬁcacy, This method can be easily adapted to detect multi-oriented or multi-lingual text as it operates at low level initial components, and it does not require such components to be characters.
Zhen Zhu et al. (2018) describes a significant challenge in the scene text detection is the large variation in the text size. This paper presents an accurate oriented text detector based on Faster R-CNN. They applied feature fusion both in RPN and Fast R-CNN to alleviate this problem and furthermore, enhance model’s ability to detect relatively small text. This text detector achieves comparable results to those state of the art methods on ICDAR 2015 and MSRA-TD500, showing its advantage and applicability.
Jianqi Ma et al. (2018) introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. They have presented the Rotation Region Proposal Networks (RRPN), which are designed to generate inclined proposals with text orientation angle information. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to a feature map for a text region classiﬁer. They have conducted experiments using the rotation-based framework on three real-world scene text detection datasets and demonstrate its superiority in terms of effectiveness and efﬁciency over previous approaches. Proposed model achieves the comparable results to state of the art methods on IC15 with F-measure reaching 0. 776.
Kun Fan et al. (2018) describes detection of text regions which is deﬁned As part of text lines containing a whole character or transitions between two adjacent characters. This paper presents simple features which consist of means and standard deviations of image gradients to train a random forest so as to detect text regions over multiple image scales and color channels. Even though our method is trained on English, our experiments demonstrate that It achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One And Many-to-One detection criteria, our method achieves 91. 6%, 87. 4%, 92. 1% And 97. 9% recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan’s Multilingual Dataset And Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.
Baoguang Shi et al. (2018) introduces Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. Both elements are detected densely at multiple scales by an end-to-end trained, fully-convolutional neural network. Final detections are produced by combining segments connected by links. It achieves an f-measure of 75. 0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin. It runs at over 20 FPS on 512×512 images. Moreover, without modiﬁcation, SegLink is able to detect long lines of non-Latin text, such as Chinese.
SegLink fails to link the characters with large character spacing. (c) SegLink fails to detect curved text. Yuliang Liu et al. (2017) describes incidental scene text detection which is challenging task because of multi-orientation, perspective distortion, and variation of text size, color and scale. This paper is focused on using rectangular bounding box or horizontal sliding window to localize text, which may result in redundant background noise, unnecessary overlap or even information loss. They proposed a new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle. System uses quadrilateral sliding windows, a shared Monte-Carlo method and a sequential protocol to complete the proposed approach.
The effectiveness of our approach is evaluated on a public word-level, multi oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 “Incidental scene text localization”. The performance of our method is evaluated by using F-measure and found to be 70. 64%, outperforming the existing state-of-the-art method with F-measure 63. 76%.
Houssem Turki et al. (2017) describes text detection in natural scenes which holds great importance in the field of research and still remains a challenge. The contribution of our proposed method is to filtering out complex backgrounds by combining three strategies. This uses MSER, CNN, SVM, HOG model. They use the technique of word grouping who the boundary box localization select different words in the image where false positives text blocks are eliminated by geometrical properties. The evaluation of the proposed method demonstrate the effectiveness of our method for complex foreground through the experimental results tested on three benchmarks ICDAR2013, ICDAR2015 and MSRA-TD500.
Xiang Bai et al. describes the method for differentiating the images that contain text from a large volume of natural images. To address this problem, we propose a novel convolutional neural network variant, called multi-scale spatial partition network (MSP-Net). The network classiﬁes images that contain text or not, by predicting text existence in all image blocks, which are spatial partitions at multiple scales on an input image. The whole image is classiﬁed as a text image (an image containing text) as long as one of the blocks is predicted to contain text. The MSP-Net takes input as a whole image and outputs block-level classiﬁcation results in an end-to-end manner. The results on several datasets have demonstrated the robustness and eﬀectiveness of this proposed method.
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers. You can order our professional work here.