Text Mining in Unclean, Noisy or Scrambled Datasets for Digital Forensics Analytics

机译：在不洁净，嘈杂或扰乱数据集中的文本挖掘数字取证分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In our era, most of the communication between people is realized in the form of electronic messages and especially through smart mobile devices. As such, the written text exchanged suffers from bad use of punctuation, misspelling words, continuous chunk of several words without spaces, tables, internet addresses etc. which make traditional text analytics methods difficult or impossible to be applied without serious effort to clean the dataset. Our proposed method in this paper can work in massive noisy and scrambled texts with minimal preprocessing by removing special characters and spaces in order to create a continuous string and detect all the repeated patterns very efficiently using the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure and a variant of All Repeated Patterns Detection (ARPaD) algorithm. Meta-analyses of the results can further assist a digital forensics investigator to detect important information to the chunk of text analyzed.

机译：在我们的时代，人们之间的大多数沟通以电子消息的形式实现，尤其是通过智能移动设备实现。因此，书面文本交换了不良使用标点符号，拼写错误，拼错单词，几个单词的连续块，没有空格，表格，互联网地址等，这使得传统文本分析方法难以或不可能进行应用而没有严重努力清洁数据集。我们本文的建议方法可以通过删除特殊字符和空格来在大规模的噪声和扰乱文本中工作，以便创建连续字符串并使用最长预期的重复模式减少后缀阵列（LERP-）非常有效地检测所有重复模式。 RSA）数据结构和所有重复模式检测（ARPAD）算法的变型。结果的Meta分析可以进一步帮助数字取证调查员检测分析的文本块的重要信息。

著录项

来源
《European Intelligence and Security Informatics Conference》|2017年|170p|共8页
会议地点
作者
Konstantinos Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP309-53;
关键词
Text mining; Digital forensics; Email analysis; ARPaD; LERP-RSA; Pattern detection;

机译：文本挖掘;数字取证;电子邮件分析;ARPAD;LERP-RSA;模式检测;

相似文献

外文文献
中文文献
专利

1. Report from the AND 2009 working group on noisy text datasets [J] . Simone Marinai, Dimosthenis Karatzas International Journal on Document Analysis and Recognition . 2011,第2期

机译：AND 2009工作组关于嘈杂文本数据集的报告
2. Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model [J] . Luai A. Al Shalabi Journal of computer sciences . 2009,第2期

机译：从嘈杂的数据集构建的数据挖掘系统的准确性和覆盖范围的提高：一种新模型
3. Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model | Science Publications [J] . Luai A. Al Shalabi Journal of computer sciences . 2009,第2期

机译：从嘈杂的数据集构建的数据挖掘系统的准确性和覆盖范围的提高：一种新模型|科学出版物
4. Text Mining in Unclean, Noisy or Scrambled Datasets for Digital Forensics Analytics [C] . Konstantinos Xylogiannopoulos, Panagiotis Karampelas, Reda Alhajj European Intelligence and Security Informatics Conference . 2017

机译：在不洁净，嘈杂或扰乱数据集中的文本挖掘数字取证分析
5. Scaling the Technology Opportunity Analysis text data mining methodology: Data extraction, cleaning, online analytical processing analysis, and reporting of large multi-source datasets. [D] . George, Richard Peyton. 2006

机译：扩展技术机会分析文本数据挖掘方法：数据提取，清理，在线分析处理分析以及大型多源数据集的报告。
6. Text mining datasets of β-hydroxybutyrate (BHB) supplement products’ consumer online reviews [O] . Ji Li, Dan Lowe, Luke Wayment, 2020

机译：β-羟基丁酸酯（BHB）补充产品的消费者在线评论的文本挖掘数据集
7. Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model [O] . Luai A. Al Shalabi 2010

机译：提高由噪声数据集构建的数据挖掘系统的准确性和覆盖范围：一种新模型

Text Mining in Unclean, Noisy or Scrambled Datasets for Digital Forensics Analytics

摘要

著录项

相似文献

相关主题

期刊订阅