首页>
外文学位
>Improving information retrieval effectiveness in digital forensic text string searches: Clustering search results using self-organizing neural networks.
【24h】
Improving information retrieval effectiveness in digital forensic text string searches: Clustering search results using self-organizing neural networks.
Digital investigations seek to recover data from digital devices to chronologically reconstruct events, confirm or refute allegations of wrong-doing, and/or gain intelligence information. Readable text and text-based documents are important artifacts in many investigations. Current state-of-the-art text string search tools utilize one of two approaches: (1) string matching algorithms to search the device for instances of specified text strings; or (2) full text indexing of the device and Boolean query retrieval using the resultant index. In digital investigations, both approaches are applied at the physical level of the device and are designed to retrieve all instances of the text string, thus achieving 100% query recall. In doing so, however, an extremely high false positive rate (poor query precision) is experienced relative to investigative objectives. Current approaches do not prioritize or organize the voluminous search results with respect to conceptual meaning, investigative relevancy, or any other manner that substantively improves information retrieval (IR) effectiveness for the digital forensic investigator or analyst.;This research extends text mining and information retrieval research to the digital forensic text string search process. Specifically, a self-organizing neural network (Kohonen (1981) Self-Organizing Map) was used to conceptually cluster files and unallocated storage space known to contain specified search strings. Information retrieval effectiveness (precision, recall, and overhead) was then measured for the clustered search hit results from two digital forensics cases (one real-world case and one mock case). Identical searches of the same two cases were also conducted using two industry leading digital forensics tools that employ a string matching approach and an indexing/Boolean query approach (EnCase(TM) and FTK(TM), respectively). The empirical results from both cases demonstrate that the extension of self-organizing neural networks to conceptually cluster digital forensic text string search hits is feasible. The empirical results from the real-world case suggest that the clustering process significantly lowers human information retrieval overhead associated with reviewing non-relevant hits, by helping the investigator get to the relevant hits more quickly than with un-clustered search hit results. Despite the lack of empirical support from the analysis of the mock evidence, further examination of the findings suggests that conceptually clustering digital forensic text string search hits is both feasible and worthwhile for investigators.
展开▼