首页> 外文会议>Discovery Science; Lecture Notes in Artificial Intelligence; 4265 >Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets
【24h】

Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets

机译:使用文体特征集识别文件的历史时期和民族起源

获取原文
获取原文并翻译 | 示例

摘要

Text classification is an important and challenging research domain. In this paper, identifying historical period and ethnic origin of documents using stylistic feature sets is investigated. The application domain is Jewish Law articles written in Hebrew-Aramaic. Such documents present various interesting problems for stylistic classification. Firstly, these documents include words from both languages. Secondly, Hebrew and Aramaic are richer than English in their morphology forms. The classification is done using six different sets of stylistic features: quantitative features, orthographic features, topographic features, lexical features and vocabulary richness. Each set of features includes various baseline features, some of them formalized by us. SVM has been chosen as the applied machine learning method since it has been very successful in text classification. The quantitative set was found as very successful and superior to all other sets. Its features are domain-independent and language-independent. It will be interesting to apply these feature sets in general and the quantitative set in particular into other domains as well as into other.
机译:文本分类是一个重要且具有挑战性的研究领域。本文研究了使用文体特征集识别文件的历史时期和民族血统。应用领域是用希伯来语-阿拉姆语撰写的犹太法律文章。这样的文档提出了用于样式分类的各种有趣的问题。首先,这些文档包括来自两种语言的单词。其次,希伯来语和阿拉姆语在形态上比英语丰富。使用六套不同的文体特征进行分类:定量特征,正字特征,地形特征,词汇特征和词汇丰富性。每组功能都包括各种基准功能,其中一些功能是我们正式制定的。由于SVM在文本分类方面非常成功,因此已被选作应用的机器学习方法。发现定量组非常成功,并且优于所有其他组。它的功能是与域无关和与语言无关的。通常,将这些特征集尤其是定量集应用于其他领域以及其他领域将是很有趣的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号