首页> 外文学位 >Machine learning for person identification with applications in forensic document analysis.
【24h】

Machine learning for person identification with applications in forensic document analysis.

机译:机器学习用于法医文档分析中的人员识别。

获取原文
获取原文并翻译 | 示例

摘要

Person identification from evidence is the primary goal of forensic analysis. The term identification answers the question "Whose sample is this?" and the term verification answers "Are these two samples from the same person?", where an evidentiary sample is from a forensic modality eg., handwriting, signature, fingerprint, DNA etc. Forensic document analysis primarily deals with handwriting and signatures. The identification problem of searching a large corpus of handwriting/signature samples to retrieve all that originated from a person, is an Information Retrieval(IR) task. The task of identification from evidence is divided into three stages, (i) Data extraction and indexing, (ii) Data analysis and learning models and (iii) Inference/Retrieval. The objective of the research is to identify and propose appropriate statistical machine learning tools to provide a solution to the above three stages. In this work, the handwriting and signatures modalities are used to ascertain the validity of the proposed approaches.; The task of data extraction and indexing is to process the data from its raw form and to make it usable for the learning/analysis stage. We propose the use of Conditional Random Fields(CRFs) to identify and distinguish components such as signatures, machine-print, handwriting and noise from a given document. Line segmentation and word recognition are the next two steps for extracting discriminating features. We propose to use a robust statistical approach for line segmentation and CRFs for handwritten word recognition. Clustering the feature extracted data using infinite mixture models is proposed for fast and efficient retrieval from a large corpus.; The task of data analysis and learning models involves understanding the characteristics of the data. We propose two different statistical approaches: one termed as learning and the other adaptation. In the first approach(learning), a large collection of training data, comprising of samples from a general population is used. We propose the use of two sets of ensemble of pairs created from the whole population: one set consists of pairs of samples from the same individual and the other set consists of pairs of samples from different individuals. Learning from these ensemble of pairs is a one time process. In the second approach( adaption), multiple known samples of a person specific to the case at hand is used to learn the variation and similarities specific to the person. This information is then used to make a probabilistic decision on any given an unknown sample using a Bayesian approach.; The third stage is an Inference (verification) or a Retrieval task. The inference task involves verifying whether or not a given questioned sample belongs to the same person as that of the known(s) sample(s). We propose an approach to quantify the strength of evidence for such verifications. In the retrieval task a given questioned sample is matched against a database of samples, and the goal is to sort the database based on its similarity to the questioned sample. Here query expansion and relevance feedback are two techniques that are analyzed to improve search results. The proposed approaches are validated from experiments conducted on handwriting and signatures corpus.
机译:从证据中识别人员是法医分析的主要目标。识别一词回答的问题是“这是谁的样品?”验证一词的答案是“这两个样本是否来自同一个人?”,其中的证据样本来自取证手段,例如笔迹,签名,指纹,DNA等。取证文件分析主要处理笔迹和签名。搜索大量笔迹/签名样本以检索全部源自人的识别问题是信息检索(IR)任务。从证据中识别的任务分为三个阶段,(i)数据提取和索引,(ii)数据分析和学习模型,以及(iii)推理/检索。研究的目的是确定并提出适当的统计机器学习工具,以为上述三个阶段提供解决方案。在这项工作中,使用手写和签名方式来确定所提出方法的有效性。数据提取和索引编制的任务是处理原始数据形式的数据,并使其可用于学习/分析阶段。我们建议使用条件随机场(CRF)来识别和区分给定文档中的组件,例如签名,机器打印,手写和噪音。线段分割和单词识别是提取区分特征的下两个步骤。我们建议使用可靠的统计方法进行线段分割,并使用CRF进行手写单词识别。提出了使用无限混合模型对特征提取数据进行聚类的方法,以从大型语料库中快速有效地进行检索。数据分析和学习模型的任务涉及了解数据的特征。我们提出两种不同的统计方法:一种称为学习方法,另一种称为适应方法。在第一种方法(学习)中,使用了大量训练数据,其中包括来自一般人群的样本。我们建议使用从整个总体中创建的两组对集合:一组由同一个人的样本对组成,另一组由不同个人的样本对组成。从成对的这些合奏中学习是一个一次性的过程。在第二种方法(改编)中,特定于当前案例的人的多个已知样本用于学习特定于该人的变异和相似性。然后,使用贝叶斯方法,将该信息用于对任何给定的未知样本进行概率决策。第三阶段是推理(验证)或检索任务。推理任务涉及验证给定的有问题样本是否与已知样本属于同一个人。我们提出一种方法来量化此类验证的证据强度。在检索任务中,将给定的问题样本与样本数据库进行匹配,目标是根据数据库与问题样本的相似性对数据库进行排序。在这里,查询扩展和相关性反馈是两种用于改善搜索结果的技术。从手写和签名语料库上进行的实验验证了所提出的方法。

著录项

  • 作者

    Srinivasan, Harish.;

  • 作者单位

    State University of New York at Buffalo.$bComputer Science and Engineering.;

  • 授予单位 State University of New York at Buffalo.$bComputer Science and Engineering.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 182 p.
  • 总页数 182
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号