首页> 外文期刊>Journal of the American Society for Information Science and Technology >Cuisine: Classification Using Stylistic Feature Sets and/or Name-Based Feature Sets
【24h】

Cuisine: Classification Using Stylistic Feature Sets and/or Name-Based Feature Sets

机译:美食:使用风格特征集和/或基于名称的特征集进行分类

获取原文
获取原文并翻译 | 示例
       

摘要

Document classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigated the use of six stylistic feature sets (including 42 features) and/or six name-based feature sets (including 234 features) for various combinations of the following classification tasks: ethnic groups of the authors and/or periods of time when the documents were written and/or places where the documents were written. The investigated corpus contains Jewish Law articles written in Hebrew-Aramaic, which present interesting problems for classification. Our system CUISINE (Classification Using Stylistic feature sets and/or NamE-based feature sets) achieves accuracy results between 90.71 to 98.99% for the seven classification experiments (ethnicity, time, place, ethnicity&time, ethnicity&place, time&place, ethnicity&time&place). For the first six tasks, the stylistic feature sets in general and the quantitative feature set in particular are enough for excellent classification results. In contrast, the name-based feature sets are rather poor for these tasks. However, for the most complex task (ethnicity&time&place), a hill-climbing model using all feature sets succeeds in significantly improving the classification results. Most of the stylistic features (34 of 42) are language-independent and domain-independent. These features might be useful to the community at large, at least for rather simple tasks.
机译:由于大量的功能,它们的依赖性以及大量的培训文档,文档分类提出了挑战。在这项研究中,我们调查了以下分类任务的各种组合使用了六个样式特征集(包括42个特征)和/或六个基于名称的特征集(包括234个特征):作者的种族和/或时期文件写入的时间和/或文件写入的时间。被调查的语料库包含用希伯来语-阿拉姆语撰写的犹太法律文章,这些文章为分类提出了有趣的问题。我们的系统CUISINE(使用样式特征集和/或基于NamE的特征集进行分类)在七个分类实验(种族,时间,地点,种族和时间,种族和地方,时间和地方,种族和时间和地方)中达到了90.71%到98.99%的准确性结果。对于前六个任务,通常的样式特征集,尤其是定量特征集足以获得出色的分类结果。相反,基于名称的功能集在这些任务上就很差。但是,对于最复杂的任务(种族,时间和地点),使用所有特征集的爬坡模型都可以成功地显着改善分类结果。大多数风格特征(42个中的34个)与语言和领域无关。这些功能可能对整个社区有用,至少对于相当简单的任务而言。

著录项

  • 来源
  • 作者单位

    Department of Computer Science, Jerusalem College of Technology (Machon Lev), 21 Havaad Haleumi Street, P.O.B. 16031, 91160 Jerusalem, Israel;

    rnDepartment of Computer Science, Jerusalem College of Technology (Machon Lev), 21 Havaad Haleumi Street, P.O.B. 16031, 91160 Jerusalem, Israel;

    rnDepartment of Computer Science, Jerusalem College of Technology (Machon Lev), 21 Havaad Haleumi Street, P.O.B. 16031, 91160 Jerusalem, Israel;

    rnDepartment of Computer Science, Jerusalem College of Technology (Machon Lev), 21 Havaad Haleumi Street, P.O.B. 16031, 91160 Jerusalem, Israel;

    Department of Computer Science, Bar-Han University, 52900 Ramat-Gan, Israel and Department of Computer Science, Jerusalem College of Technology (Machon Lev);

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号