首页> 外文期刊>Automatic Control and Computer Sciences >Synthetic Sample Extension in Implementation of Tangut Character Databases
【24h】

Synthetic Sample Extension in Implementation of Tangut Character Databases

机译:在实施正方形数据库的合成样本扩展

获取原文
获取原文并翻译 | 示例
           

摘要

The Tangut script was a logographic writing system used for the extinct Tangut language of the Western Xia Dynasty, which spanned 1038 to 1227. The technic of optical character recognition, machine learning, and computer vision will help greatly in the unscrambling of the character in the ancient scripts. But all these technics are based on the character database, which provides learning samples and test standards. In the process of building the Tangut Character Databases using the ancient Tangut scripts as a data source, it is found that the problem of imbalanced class distribution significantly compromises the performance of learning algorithms. A method of synthetic sample generation was proposed in this paper to improve the performance of learning and recognition of Tangut characters. The comparison of recognition accuracy between the learning base in the original data set and the synthetic generated data set was demonstrated, and presented an impressive superiority utilizing the researchers’ method. The organization of Tangut character databases was also introduced in this paper.
机译:Trantut脚本是一种用于西夏王朝的灭绝的逻辑写作系统,跨越了1038至1227年。光学字符识别,机器学习和计算机愿景的技术将在解读角色中有助于大大帮助古代剧本。但所有这些技术都基于字符数据库,它提供学习样本和测试标准。在使用古老的转矩脚本作为数据源建立正向性字符数据库的过程中,发现不平衡的类分布问题显着损害了学习算法的性能。本文提出了一种合成样本生成方法,提高了对弯曲特征的学习和识别的性能。对原始数据集的学习基础与合成生成数据集之间的识别准确性的比较,并利用研究人员的方法呈现了令人印象深刻的优势。本文还介绍了非线性字符数据库的组织。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号