首页> 外文会议>Cybercrime and Trustworthy Computing Workshop >A Comparative Study of Likelihood Ratio Based Forensic Text Comparison Procedures: Multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams

【24h】

A Comparative Study of Likelihood Ratio Based Forensic Text Comparison Procedures: Multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams

机译：基于似然比的法医文本比较程序的比较研究：词汇特征的多变量核心密度与字N-克与字符n克

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This is a comparative study to empirically investigate the performances of three different procedures for calculating authorship attribution likelihood ratios (LR). The procedures to be compared are: 1) a procedure based on multivariate kernel density (MVKD) with lexical features; 2) a procedure based on word N-grams; and 3) a procedure based on character N-grams. Furthermore, the best-performing LRs of these three procedures are fused into combined single LRs using a logistic-regression fusion, in order to investigate the extent of the improvement/deterioration that the fusion brings about. This study uses chatlog messages, which were presented as evidence to prosecute paedophiles, for testing. The numbers of word tokens used to model the authorship attribution of each message group are 500 and 1000 words. This was done to examine the effect of sample size on the performance of a system. The performance of a system is assessed with regard to its validity (= accuracy) and reliability (= precision) using the log-likelihood-ratio cost (Cllr) and 95% credible intervals (CI), respectively. While describing the different characteristics of these three procedures in their outcomes, this study demonstrates that the MVKD procedure was the best-performing procedure out of the three in terms of Cllr . This study also demonstrates that a logistic-regression fusion is useful for combining the LRs obtained from the three procedures in question, resulting in a good improvement in performance.

机译：这是一个比较研究，以便明确调查三种不同程序的表演来计算作者归因似然比（LR）。要进行比较的程序是：1）基于具有词汇特征的多元核密度（MVKD）的过程; 2）基于n-grams的过程; 3）基于角色n-gram的过程。此外，使用逻辑回归融合，这三种方法的最佳性能LRS融合到组合的单个LR中，以研究融合带来的改进/恶化的程度。本研究使用Chatlog消息，呈现为检测恋童癖者的证据，以进行测试。用于模拟每个消息组的Autheration归属的单词令牌的数量为500和1000字。这是为了检查样本大小对系统性能的影响。根据其有效性（=精度）和可靠性（=精度）分别使用逻辑似然比成本（CLLR）和95％可信间隔（CI）来评估系统的性能。在描述其结果的这三个程序的不同特征的同时，本研究表明，MVKD程序是在CLLR方面的三个中最佳的程序。本研究还表明，逻辑回归融合对于组合从有问题的三个程序中获得的LRS是有用的，导致性能良好。

著录项

来源
《Cybercrime and Trustworthy Computing Workshop 》|2014年||共11页
会议地点
作者
Ishihara Shunichi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
95 credible intervals; Tippett plots; character N-grams; forensic text comparison; lexical features; likelihood ratio; log likelihood ratio cost; logistic-regression fusion; multivariate kernel density; word N-grams;

机译：95％可靠的间隔;Tippett图;字符n-grams;法医文本比较;词汇特征;似然比;对数似然比成本;逻辑回归融合;多元核密度;字n-grams;

相似文献

外文文献
中文文献
专利

1. Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition [J] . Alejandro García, Andrea Segura-Olivares, Hiram Calvo Computacion y Sistemas . 2014 ,第3期

机译：短语相似度的文本相似性度量中基于依存关系和基于成分的句法N语法
2. Probabilistic Evaluation of SMS Messages as Forensic Evidence: Likelihood Ratio Based Approach with Lexical Features [J] . Shunichi Ishihara International journal of digital crime and forensics . 2012 ,第3期

机译：SMS消息作为法医证据的概率评估：具有词法特征的基于似然比的方法
3. The effect of correlation on strength of evidence estimates in Forensic Voice Comparison: uni- and multivariate Likelihood Ratio-based discrimination with Australian English vowel acoustics [J] . Phil Rose International Journal of Biometrics . 2010 ,第4期

机译：相关性对法医语音比较中证据估计强度的影响：基于单或多变量似然比的澳大利亚英语元音声学识别
4. A Comparative Study of Likelihood Ratio Based Forensic Text Comparison Procedures: Multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams [C] . Ishihara Shunichi Cybercrime and Trustworthy Computing Workshop . 2015

机译：基于似然比的法医文本比较程序的比较研究：具有词法特征的多变量内核密度与单词N-grams与字符N-grams
5. N-gram vs. Keyword-based Passage Retrieval for Question Answering [O] . Davide Buscaldi, José Manuel Gomez, Paolo Rosso, 2009

机译：N-gram与基于关键字的段落检索用于问答

A Comparative Study of Likelihood Ratio Based Forensic Text Comparison Procedures: Multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams

摘要

著录项

相似文献

相关主题

期刊订阅