首页> 外文会议>IAPR International Workshop on Document Analysis Systems >AreCAPTCHA: Outsourcing Arabic Text Digitization to Native Speakers
【24h】

AreCAPTCHA: Outsourcing Arabic Text Digitization to Native Speakers

机译:AreCAPTCHA:将阿拉伯文本数字化外包给以英语为母语的人

获取原文

摘要

There has been a recent increasing demand to digitize Arabic books and documents, due to the fact that digital books do not lose quality over time, and can be easily sustained. Meanwhile, the number of Arabic-speaking Internet users is increasing. We propose AreCAPTCHA, a system that digitizes Arabic text by outsourcing it to native Arabic speakers, while offering protective measures to online web forms of Arabic websites. As users interact with AreCAPTCHA, we collect possible digitizations of words that were not recognized by OCR programs. We explain how the system works, the challenges we faced, and promising preliminary evaluation results.
机译:由于数字书籍不会随着时间的流逝而失去质量并且可以很容易地得到维护,因此近来对阿拉伯书籍和文档进行数字化的需求不断增加。同时,说阿拉伯语的互联网用户数量正在增加。我们建议使用AreCAPTCHA,该系统通过将阿拉伯文字外包给以阿拉伯语为母语的人来数字化阿拉伯文字,同时为阿拉伯网站的在线网络形式提供保护措施。当用户与AreCAPTCHA进行交互时,我们会收集OCR程序无法识别的单词的可能数字化。我们将说明系统的工作原理,面临的挑战以及有希望的初步评估结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号