【24h】

Greek Alphabet Recognition Technique for Biomedical Documents

机译:生物医学文献的希腊字母识别技术

获取原文
获取原文并翻译 | 示例

摘要

Most current commercial optical character recognition (OCR) systems can accurately recognize the text in documents written in a single language. However, when dealing with Greek characters embedded in predominantly English text, these systems do not perform well, and most OCR systems do not recognize the characters as belonging to the Greek alphabet. As a result, the degree of manual review required to validate and correct OCR errors is high. To handle this problem, we propose a new technique based on features calculated from the output of multiple OCR systems, and combined with string pattern matching and document content analysis to improve the recognition of both Greek characters and regular text. Our proposed technique uses two passes of a document page image through OCR systems that use different recognition languages. Experiments carried out on a sample of medical journals show the feasibility of using the proposed technique for Greek character recognition. Preliminary evaluation conducted on a sample of medical journal page images shows that our approach improves the recognition of Greek characters embedded within predominantly English language text.
机译:当前,大多数商业光学字符识别(OCR)系统都可以准确地识别以单一语言编写的文档中的文本。但是,当处理主要嵌入在英语文本中的希腊字符时,这些系统的性能不佳,并且大多数OCR系统都不认为该字符属于希腊字母。结果,验证和纠正OCR错误所需的手动检查程度很高。为了解决这个问题,我们提出了一种新技术,该技术基于从多个OCR系统的输出中计算出的特征,并与字符串模式匹配和文档内容分析相结合,以提高对希腊字符和常规文本的识别能力。我们提出的技术通过使用不同识别语言的OCR系统两次使用文档页面图像。在医学期刊样本上进行的实验表明,使用提议的技术进行希腊字符识别的可行性。对医学期刊页面图像样本进行的初步评估表明,我们的方法提高了对嵌入英语文本为主的希腊字符的识别能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号