Rumormongers always use long paragraphs to spread slanderous stories so that they can convince readers. Those illegal or sensitive rumors uploaded into the internet can be written on images to by-pass text filters. These images can be detected by existing filters such as OCR, but the detection is very time consuming. To prohibit the dissemination of those commentaries, detecting whether an image contains a sufficient amount of words provides convenience to the government or internet service providers. Because of this, we focus on developing a fast pre-processor algorithm for detecting images embedded with sufficient text, such that the text filters (e.g. OCR) only need to focus on those suspected images. In this paper, we propose a histogram-based fast detection method to determine whether an image contains paragraphs of text or not. Binary histograms are extracted from the converted binary images. Then, due to the periodic pattern of the histograms, a step curve is designed to apply on the autocorrelation of those histograms. The area under the curve is further utilized to differentiate images with paragraphs and those without. To imitate the scenario, we construct a new dataset covering more than 2000 images of with and without paragraphs. The results show the effectiveness of the proposed detection system, which achieves 99.5% in accuracy and 15 millisecond per image in speed implemented in C++.
展开▼