OCR recognition rate would be down for recognizing the Nippon-Go documents with both base characters and ruby-characters. To improve the rate, the idea is proposed that ruby-characters and base characters recognized separately. The idea requires the method to separate ruby-character class from the base characters class. A new concept of "character dimension" is introduced to identify both character classes. In this paper, it is described measurements of the character dimensions along with the implementation and the effects of the prototype system.
展开▼