首页> 外文学位 >Improving information retrieval effectiveness in digital forensic text string searches: Clustering search results using self-organizing neural networks.
【24h】

Improving information retrieval effectiveness in digital forensic text string searches: Clustering search results using self-organizing neural networks.

机译:在数字取证文本字符串搜索中提高信息检索效率:使用自组织神经网络对搜索结果进行聚类。

获取原文
获取原文并翻译 | 示例

摘要

Digital investigations seek to recover data from digital devices to chronologically reconstruct events, confirm or refute allegations of wrong-doing, and/or gain intelligence information. Readable text and text-based documents are important artifacts in many investigations. Current state-of-the-art text string search tools utilize one of two approaches: (1) string matching algorithms to search the device for instances of specified text strings; or (2) full text indexing of the device and Boolean query retrieval using the resultant index. In digital investigations, both approaches are applied at the physical level of the device and are designed to retrieve all instances of the text string, thus achieving 100% query recall. In doing so, however, an extremely high false positive rate (poor query precision) is experienced relative to investigative objectives. Current approaches do not prioritize or organize the voluminous search results with respect to conceptual meaning, investigative relevancy, or any other manner that substantively improves information retrieval (IR) effectiveness for the digital forensic investigator or analyst.;This research extends text mining and information retrieval research to the digital forensic text string search process. Specifically, a self-organizing neural network (Kohonen (1981) Self-Organizing Map) was used to conceptually cluster files and unallocated storage space known to contain specified search strings. Information retrieval effectiveness (precision, recall, and overhead) was then measured for the clustered search hit results from two digital forensics cases (one real-world case and one mock case). Identical searches of the same two cases were also conducted using two industry leading digital forensics tools that employ a string matching approach and an indexing/Boolean query approach (EnCase(TM) and FTK(TM), respectively). The empirical results from both cases demonstrate that the extension of self-organizing neural networks to conceptually cluster digital forensic text string search hits is feasible. The empirical results from the real-world case suggest that the clustering process significantly lowers human information retrieval overhead associated with reviewing non-relevant hits, by helping the investigator get to the relevant hits more quickly than with un-clustered search hit results. Despite the lack of empirical support from the analysis of the mock evidence, further examination of the findings suggests that conceptually clustering digital forensic text string search hits is both feasible and worthwhile for investigators.
机译:数字调查旨在从数字设备中恢复数据,以便按时间顺序重建事件,确认或驳斥有关不当行为的指控和/或获取情报信息。可读文本和基于文本的文档是许多调查中的重要工件。当前最先进的文本字符串搜索工具利用以下两种方法之一:(1)字符串匹配算法可在设备中搜索指定文本字符串的实例;或(2)对设备进行全文索引,并使用结果索引进行布尔查询检索。在数字调查中,这两种方法都应用于设备的物理级别,并且旨在检索文本字符串的所有实例,从而实现100%的查询调用率。但是,这样做会导致相对于调查目标的假阳性率极高(查询精度差)。当前的方法没有在概念意义,调查相关性或任何其他可实质改善数字法医调查人员或分析人员信息检索(IR)有效性的方式上对大量搜索结果进行优先级安排或组织。本研究扩展了文本挖掘和信息检索研究数字取证文本字符串搜索过程。具体而言,自组织神经网络(Kohonen(1981)自组织图)用于概念上对文件和已知包含指定搜索字符串的未分配存储空间进行聚类。然后,针对来自两个数字取证案例(一个真实案例和一个模拟案例)的聚类搜索命中结果,测量了信息检索的有效性(准确性,召回性和开销)。还使用两个行业领先的数字取证工具对相同的两种情况进行了相同的搜索,这些工具采用了字符串匹配方法和索引/布尔查询方法(分别为EnCase™和FTK™)。两种情况的经验结果表明,将自组织神经网络扩展到概念上对数字取证文本字符串搜索命中进行聚类是可行的。实际案例的经验结果表明,聚类过程可通过帮助研究者比非聚类搜索命中结果更快地到达相关命中,从而显着降低了与审阅不相关命中相关的人类信息检索开销。尽管对模拟证据的分析缺乏经验支持,但对调查结果的进一步检查表明,在概念上对数字取证文本字符串搜索命中进行聚类对于研究人员既可行又值得。

著录项

  • 作者

    Beebe, Nicole L.;

  • 作者单位

    The University of Texas at San Antonio.;

  • 授予单位 The University of Texas at San Antonio.;
  • 学科 Sociology Criminology and Penology.;Computer Science.;Information Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 241 p.
  • 总页数 241
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 法学各部门;信息与知识传播;自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:39:26

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号