A fused forensic text comparison system using lexical features, word and character N-grams

机译：使用词汇特征，单词和字符N-gram的融合式取证文本比较系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This study investigates the degree that the performance of a likelihood ratio (LR)-based forensic text comparison (FTC) system improves by using logistic-regression fusion on LRs that were separately estimated by three different procedures, involving lexical features, word-based N-grams and character-based N-grams. This study uses predatory chatlog messages. The number of words used for modelling each group of messages is 500 words. The performance of the FTC system is assessed in terms of its validity (= accuracy) and reliability (= precision) using the log-likelihood-ratio cost (C) and 95% credible intervals (CI), respectively. It is demonstrated that 1) out of the three procedures, the lexical features procedure performed best in terms of C; and that 2) the fused system outperformed all three of the single procedures. The C value of the fused system is better than that of the procedure with lexical features by a value of 0.14. It is also reported that the validity and reliability of a system is negatively correlated; the fused system that yielded the best result in terms of C has the worst CI value.

机译：本研究调查了基于LRS上的Logistic-Rescollion Fusions的逻辑回归融合来调查了似然比（LR）的遗传比较（FTC）系统的性能来改善，所述LRS被单独估计为三种不同的程序，涉及词汇特征，基于Word的N -gram和基于字符的n-grams。本研究使用掠夺性Chatlog消息。用于建模每组消息的单词数量为500字。根据其有效性（=精度）和可靠性（=精度）分别使用逻辑似然比成本（c）和95％可信间隔（CI）的可靠性（=精度）来评估FTC系统的性能。据证明1）在三个程序中，词汇特征过程在C方面表现最佳;这2）融合系统表现出所有三个单一程序的表现。融合系统的C值优于具有0.14的词汇特征的过程的程序。还报告说，系统的有效性和可靠性是负相关的;在C方面产生最佳结果的融合系统具有最差的CI值。

著录项

来源
《International conference on advances in computing, communications and informatics》|2014年|2762-2768|共7页
会议地点
作者
Ishihara S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
digital forensics; feature extraction; logistics; natural language processing; regression analysis; text analysis; FTC system; character n-grams; fused forensic text comparison system; lexical features; likelihood ratio-based forensic text comparison system; log-likelihood-ratio cost; logistic-regression fusion; predatory chatlog messages; word n-grams; Calibration; Databases; Forensics; Kernel; Probability; Reliability; Vectors; 95 credible intervals; N-grams; Tippett plot; forensic text comparison; lexical features; likelihood ratio; log likelihood ratio cost; logistic-regression fusion;

机译：数字取证;特征提取;物流;自然语言处理;回归分析;文本分析; FTC系统;字符n-gram;融合的取证文本比较系统;词汇特征;基于似然比的取证文本比较系统;对数似然比;逻辑回归融合;掠夺性聊天消息;单词n-grams;校准;数据库;取证;内核;概率;可靠性;向量; 95％可信区间; N-grams; Tippett图;取证文本比较;词汇特征;似然比对数似然比代价Logistic回归回归;

相似文献

外文文献
中文文献
专利

1. Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system [J] . Yanqiu Shao, Jiqing Han, Ting Liu, International journal of speech technology . 2007,第1期

机译：普通话文本到语音系统的从词自动转换为韵律词
2. WORDS VERSUS CHARACTER N-GRAMS FOR ANTI-SPAM FILTERING [J] . IOANNIS KANARIS, KONSTANTINOS KANARIS, IOANNIS HOUVARDAS, International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2007,第6期

机译：单词与字符N-GRAMS进行反垃圾邮件过滤
3. Single-Word Recognition Need Not Depend on Single-Word Features: Narrative Coherence Counteracts Effects of Single-Word Features that Lexical Decision Emphasizes [J] . Teng Dan W., Wallot Sebastian, Kelty-Stephen Damian G. Journal of psycholinguistic research . 2016,第6期

机译：单字识别不需要依赖单字特征：叙事连贯性抵消了词汇决策所强调的单字特征的影响
4. A Comparative Study of Likelihood Ratio Based Forensic Text Comparison Procedures: Multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams [C] . Ishihara Shunichi Cybercrime and Trustworthy Computing Workshop . 2015

机译：基于似然比的法医文本比较程序的比较研究：具有词法特征的多变量内核密度与单词N-grams与字符N-grams
5. Multi-word units and lexical phrases in ESL texts: A content analysis. [D] . Hasib, Shama T. 2000

机译：ESL文本中的多单词单元和词汇短语：内容分析。
6. Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models [O] . Hendrik Vankrunkelsven, Steven Verheyen, Gert Storms, 2018

机译：预测词法规范：单词联想模型与基于文本的单词共现模型之间的比较
7. A comparison and semi-quantitative analysis of words and character-bigrams as features in chinese text categorization [O] . Jingyang Li, Maosong Sun, Xian Zhang 2006

机译：汉字分类中的特征词与汉字的比较和半定量分析

A fused forensic text comparison system using lexical features, word and character N-grams

摘要

著录项

相似文献

相关主题

期刊订阅