首页> 外文期刊>Cybernetics and Systems >Classifying True and False Hebrew Stories Using Word N-Grams
【24h】

Classifying True and False Hebrew Stories Using Word N-Grams

机译:使用Word N语法对正确和错误的希伯来语故事进行分类

获取原文
获取原文并翻译 | 示例
           

摘要

False story detection is an important and challenging problem. This paper presents a simple and sound methodology that is able to automatically distinguish between true and false Hebrew stories using either psychological or semantic information. The examined corpus contains 96 stories that were composed by 48 native Hebrew speakers who were asked to tell both true and false stories. The features used by the classification model are word unigrams, bigrams, and trigrams. Different experiments on various combinations of these feature sets using five supervised machine learning (ML) methods, the InfoGain feature filtering method, and parameter tuning have been performed. We report on the success of this approach in identifying the correct types of stories. The word unigrams set was superior to all other feature sets. For the first classification task (true and false stories), the logistic regression ML method was the best method, achieving an accuracy of 91.67%. The two decision tree ML methods (J48 and REPTree) also present high accuracy results (90.63% and 87.5%) using only 5 and 4 unigrams, respectively.
机译:错误的故事检测是一个重要且具有挑战性的问题。本文提出了一种简单而合理的方法,该方法能够使用心理或语义信息自动区分真假希伯来语故事。所检查的语料库包含96个故事,这些故事由48位希伯来语母语人士组成,他们被要求讲述真实和错误的故事。分类模型使用的特征是单词字母组合词,双字母组和三字母组。使用五种监督机器学习(ML)方法,InfoGain特征过滤方法和参数调整对这些特征集的各种组合进行了不同的实验。我们报告了这种方法在识别正确故事类型方面的成功。字母组合词集优于所有其他功能集。对于第一个分类任务(对与错的故事),逻辑回归ML方法是最好的方法,准确性达到91.67%。两种决策树ML方法(J48和REPTree)也分别仅使用5和4字母组合表示了高精度结果(分别为90.63%和87.5%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号