...
首页> 外文期刊>Cybernetics and Systems >Automatically Identifying Citations in Hebrew-Aramaic Documents
【24h】

Automatically Identifying Citations in Hebrew-Aramaic Documents

机译:自动识别希伯来语-阿拉伯语文档中的引用

获取原文
获取原文并翻译 | 示例

摘要

Citations in documents contain important information about the sources that authors cite and their importance and impact. Therefore, automatic identification of citations from documents is an important task. Citations included in rabbinic literature are more difficult to identify and to extract than citations in scientific papers written in English for various reasons. The aim of this novel research is to automatically identify undated citations included a unique data set: rabbinic documents written in Hebrew-Aramaic. We formulate four feature sets: orthographic, quantitative, stopword-based, andn-gram-based. Different experiments on all combinations of these feature sets using six common machine learning methods and Infogain have been performed. A combination of all four feature sets using logistic regression achieves an accuracy of 91.98%, which isan improvement of 16.53% compared to a baseline result.
机译:文档中的引文包含有关作者引用的来源及其重要性和影响的重要信息。因此,从文件中自动识别引文是一项重要的任务。出于各种原因,与用英语撰写的科学论文中的引用相比,阿拉伯文献中的引用更难以识别和提取。这项新颖的研究的目的是自动识别包括唯一数据集的未注明日期的引用:用希伯来语-阿拉姆语编写的阿拉伯语文档。我们制定了四个特征集:正交,定量,基于停用词和基于n-gram的特征集。使用六种常见的机器学习方法和Infogain对这些功能集的所有组合进行了不同的实验。使用逻辑回归将所有四个特征集组合在一起,可以达到91.98%的精度,与基准结果相比,提高了16.53%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号