首页> 外文学位 >Naive Bayes and similarity based methods for identifying computer users using keystroke patterns.
【24h】

Naive Bayes and similarity based methods for identifying computer users using keystroke patterns.

机译:朴素贝叶斯和基于相似度的使用击键模式识别计算机用户的方法。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation, we present two methods for identifying computer users using keystroke patterns. In the first method "Competition between naive Bayes models for user identification," a naive Bayes model is created for each user. In the training phase of this method, the model of a user is trained using maximum likelihood estimation on the key press latency values extracted from the texts typed by the user. In the user identification phase of this method, for each user we determine the probabilistic likelihood that the typed text belongs to a user. Finally, the typed text is assigned to the user with the highest likelihood value. In the second method "Similarity based user identification," each user is represented by a distinct model. In the training phase of this method, the model parameters of a user are estimated using the extracted key press latency values from the texts typed by the user. In the user identification phase of this method, we assign a similarity score to each user given a typed text. The similarity score of a user is determined by finding the ratio between (1) the number of key press latency values extracted from the typed text similar to the estimated model parameters of the user and (2) the total number of key press latency values extracted from the typed text. Finally, the typed text is assigned to the user with the highest similarity score.;We also present a novel application of distance based outlier detection method for discarding outliers in the extracted key press latency values from a users' typed text. Outliers are detected using the following three-step procedure: (1) for each extracted latency value xi, a neighborhood region using a distance threshold is created, (2) a latency value xj is considered as a neighbor of xi if xj falls in the neighborhood region of xi and (3) the latency value xi is considered as an outlying value if the number of neighbors determined for xi are less than a pre-set threshold.;To empirically evaluate the performance of our proposed work, a keystroke data set was collected from ten users, where each user provided 15 typing samples. From the provided typing samples, six distinct datasets were created in which the number of user identification attempts varied from 150 to 54600. Results on the datasets indicate that the identification accuracy of the "Competition between naive Bayes models for user identification method" ranges from 89.62% to 99.65% and the identification accuracy of the "Similarity based user identification method" ranges from 96.33% to 100%. Further, the performance of our proposed two user identification methods is compared with the performance of two user identification methods reported in the recent literature.;To further improve the performance of the user identification methods, we theoretically analyze Majority Voting Rule (MVR) based fusion of two or more user identification methods. We formulate a procedure for theoretically estimating the identification accuracy of the MVR based fusion of user identification methods. Our proposed procedure, unlike the procedure presented in the literature of MVR based fusion, does not assume that the methods to be fused have the identical identification accuracy. The theoretically estimated identification accuracy of the MVR based fusion of user identification methods is analyzed in the light of empirical results.
机译:本文提出了两种利用击键模式识别计算机用户的方法。在第一种方法“用于用户标识的朴素贝叶斯模型之间的竞争”中,为每个用户创建了朴素贝叶斯模型。在此方法的训练阶段,使用最大似然估计对从用户键入的文本中提取的按键等待时间值进行训练,从而对用户的模型进行训练。在此方法的用户识别阶段,对于每个用户,我们确定键入的文本属于用户的概率可能性。最后,将键入的文本分配给具有最高似然值的用户。在第二种方法“基于相似性的用户标识”中,每个用户都由不同的模型表示。在此方法的训练阶段,使用从用户键入的文本中提取的按键等待时间值来估算用户的模型参数。在此方法的用户识别阶段,我们给给定键入文本的每个用户一个相似性评分。通过找到(1)从类似于用户的估计模型参数的键入文本中提取的按键等待时间值的数量与(2)提取的按键等待时间值的总数之间的比率来确定用户的相似性得分从键入的文本。最后,将键入的文本分配给具有最高相似度分数的用户。我们还提出了一种基于距离的离群值检测方法的新颖应用,该方法用于丢弃从用户键入的文本中提取的按键等待时间值中的离群值。使用以下三步过程检测离群值:(1)对于每个提取的等待时间值xi,创建使用距离阈值的邻域;(2)如果xj落在xi中,则将等待时间值xj视为xi的邻居。 xi的邻域和(3)如果为xi确定的邻居数小于预设阈值,则将等待时间值xi视为离群值。为了实证评估我们提出的工作的性能,请输入击键数据集是从十个用户那里收集的,每个用户提供了15个打字样本。从提供的类型样本中,创建了六个不同的数据集,其中用户识别尝试的次数从150到54600不等。数据集上的结果表明,“用于用户识别方法的朴素贝叶斯模型之间的竞争”的识别精度范围为89.62。 %到99.65%,“基于相似性的用户识别方法”的识别准确度在96.33%到100%之间。此外,将我们提出的两种用户识别方法的性能与最近文献中报道的两种用户识别方法的性能进行了比较。为了进一步提高用户识别方法的性能,我们在理论上分析了基于多数投票规则(MVR)的融合两种或多种用户识别方法。我们制定了一个程序,用于从理论上估计基于MVR的用户识别方法融合的识别准确性。与基于MVR的融合文献中介绍的过程不同,我们提出的过程并不假定要融合的方法具有相同的识别精度。根据经验结果,分析了基于MVR的用户识别方法融合的理论估计的识别准确性。

著录项

  • 作者

    Joshi, Shrijit S.;

  • 作者单位

    Louisiana Tech University.;

  • 授予单位 Louisiana Tech University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 126 p.
  • 总页数 126
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:49

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号