We have proposed previously a component-tree based approach to user-intention guided text extraction from natural scene images. In this paper, in addition to improving the performance of text extraction algorithm for "swipe" gesture, the algorithm has also been extended to support a new mode of using "tap" gesture to indicate the intended text. Given a grayscale image, two component-trees are built and pre-pruned first by using a so-called contrasting extremal region (CER) criterion and simple rules of geometric features. The remaining nodes are enhanced by using color information in a perceptual color space. Then, a pre-trained neural network is used to classify a selected set of enhanced nodes as single-character or non-text objects. The remaining nodes are grouped into candidate text lines, where possible outliers are pruned in individual lines. Finally, the text line "swiped" or "tapped" by a user is selected as the target line and the intended text is extracted accordingly. The proposed algorithm has been evaluated on ICDAR-2003 benchmark dataset and a superior performance is achieved against the previous methods.
展开▼