首页> 外文会议>Intelligent control and automation >Segmentation of Mixed Chinese/English DocumentsBased on Chinese Radicals Recognition and ComplexityAnalysis in Local Segment Pattern
【24h】

Segmentation of Mixed Chinese/English DocumentsBased on Chinese Radicals Recognition and ComplexityAnalysis in Local Segment Pattern

机译:基于中文自由基识别和局部句段复杂度分析的中英文混合文档分割

获取原文
获取原文并翻译 | 示例

摘要

Segmentation based on character recognition is one of the mostrnpopular methods of segmenting mixed Chinese/English documents. However,rnthe rejection to outliers is always the bottleneck of this method. A new methodrnis provided to alleviate the problem in this paper. We will give language attributernof each segment as possible as we can and then merge or split segment accordingrnto the language attribute. First of all, we construct a mixed OCR enginernfor Chinese radical and English character and some English character-pairs.rnFurthermore, English negative samples are trained to improve the capability ofrnrejection to outliers. Finally, language determination of segments based on thernmixed OCR engine and complexity analysis of local pattern is conducted. Encouragingrnperformance has been obtained according to the test results.
机译:基于字符识别的分割是分割中英文文档的最流行的方法之一。但是,拒绝异常值始终是该方法的瓶颈。本文提供了一种新的方法来减轻该问题。我们将尽可能为每个片段赋予语言属性,然后根据语言属性合并或拆分片段。首先,我们为汉字根部和英语字符以及一些英语字符对构建了一个混合的OCR引擎。最后,基于混合OCR引擎对片段进行语言确定,并对局部模式进行复杂度分析。根据测试结果获得了令人鼓舞的性能。

著录项

  • 来源
    《Intelligent control and automation》|2006年|497–506|共10页
  • 会议地点 Kunming(CN);Kunming(CN)
  • 作者单位

    Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China yong.xia@ia.ac.cn;

    Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China baihua.xiao@ia.ac.cn;

    Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China chunheng.wang@ia.ac.cn;

    Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China yaodong.li@ia.ac.cn;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号