首页> 外文会议>International conference on advances in computing, communications and informatics >A fused forensic text comparison system using lexical features, word and character N-grams
【24h】

A fused forensic text comparison system using lexical features, word and character N-grams

机译:使用词汇特征,单词和字符N-gram的融合式取证文本比较系统

获取原文

摘要

This study investigates the degree that the performance of a likelihood ratio (LR)-based forensic text comparison (FTC) system improves by using logistic-regression fusion on LRs that were separately estimated by three different procedures, involving lexical features, word-based N-grams and character-based N-grams. This study uses predatory chatlog messages. The number of words used for modelling each group of messages is 500 words. The performance of the FTC system is assessed in terms of its validity (= accuracy) and reliability (= precision) using the log-likelihood-ratio cost (C) and 95% credible intervals (CI), respectively. It is demonstrated that 1) out of the three procedures, the lexical features procedure performed best in terms of C; and that 2) the fused system outperformed all three of the single procedures. The C value of the fused system is better than that of the procedure with lexical features by a value of 0.14. It is also reported that the validity and reliability of a system is negatively correlated; the fused system that yielded the best result in terms of C has the worst CI value.
机译:本研究调查了基于LRS上的Logistic-Rescollion Fusions的逻辑回归融合来调查了似然比(LR)的遗传比较(FTC)系统的性能来改善,所述LRS被单独估计为三种不同的程序,涉及词汇特征,基于Word的N -gram和基于字符的n-grams。本研究使用掠夺性Chatlog消息。用于建模每组消息的单词数量为500字。根据其有效性(=精度)和可靠性(=精度)分别使用逻辑似然比成本(c)和95%可信间隔(CI)的可靠性(=精度)来评估FTC系统的性能。据证明1)在三个程序中,词汇特征过程在C方面表现最佳;这2)融合系统表现出所有三个单一程序的表现。融合系统的C值优于具有0.14的词汇特征的过程的程序。还报告说,系统的有效性和可靠性是负相关的;在C方面产生最佳结果的融合系统具有最差的CI值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号