首页> 外文期刊>Neural computing & applications >ARDIS: a Swedish historical handwritten digit dataset
【24h】

ARDIS: a Swedish historical handwritten digit dataset

机译:Ardis:瑞典历史手写数字数据集

获取原文
获取原文并翻译 | 示例
       

摘要

This paper introduces a new image-based handwritten historical digit dataset named Arkiv Digital Sweden (ARDIS). The images in ARDIS dataset are extracted from 15,000 Swedish church records which were written by different priests with various handwriting styles in the nineteenth and twentieth centuries. The constructed dataset consists of three single-digit datasets and one-digit string dataset. The digit string dataset includes 10,000 samples in red-green-blue color space, whereas the other datasets contain 7600 single-digit images in different color spaces. An extensive analysis of machine learning methods on several digit datasets is carried out. Additionally, correlation between ARDIS and existing digit datasets Modified National Institute of Standards and Technology (MNIST) and US Postal Service (USPS) is investigated. Experimental results show that machine learning algorithms, including deep learning methods, provide low recognition accuracy as they face difficulties when trained on existing datasets and tested on ARDIS dataset. Accordingly, convolutional neural network trained on MNIST and USPS and tested on ARDIS provide the highest accuracies58.80% respectively. Consequently, the results reveal that machine learning methods trained on existing datasets can have difficulties to recognize digits effectively on our dataset which proves that ARDIS dataset has unique characteristics. This dataset is publicly available for the research community to further advance handwritten digit recognition algorithms.
机译:本文介绍了一个名为Arkiv Digital Sweden(ARDIS)的新的基于图像的手写历史数字数据集。 ARDIS DataSet中的图像从15,000名瑞典教堂记录中提取,这些记录由不同的牧师编写,在第十九世纪和二十多世纪中具有各种笔迹风格。构造的数据集由三位单位数据集和一位字符串数据集组成。数字字符串数据集包括红色蓝色颜色空间中的10,000个样本,而另一个数据集包含在不同颜色空间中的7600个单位图像。进行了对几位数数据集的机器学习方法的广泛分析。此外,调查了ARDIS和现有数字数据集之间的相关性修改了国家标准和技术研究所(MNIST)和美国邮政服务(USPS)。实验结果表明,机器学习算法,包括深度学习方法,提供低识别准确性,因为它们在培训在现有数据集上培训并在ARDIS数据集上进行测试时,它们面临困难。因此,卷积在MNIST和USPS上培训并在ARDIS上进行测试的卷积神经网络分别提供最高的精度58.80%。因此,结果表明,在现有数据集上培训的机器学习方法可能具有难以在我们的数据集上识别数字的困难,这证明了ARDIS数据集具有独特的特性。该数据集公开可用于研究社区,以进一步提前提前手写的数字识别算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号