首页> 外文会议>International Conference on Control, Decision and Information Technologies >String similarity algorithms for a ticket classification system
【24h】

String similarity algorithms for a ticket classification system

机译:车票分类系统的字符串相似度算法

获取原文

摘要

Fuzzy string matching allows for close, but not exactly, matching strings to be compared and extracted from bodies of text. As such, they are useful in systems which automatically extract and process documents. We summarise and compare various existing algorithms for achieving string similarity measures: Longest Common Subsequence (LCS), Dice coefficient, Cosine Similarity, Levenshtein distance and Damerau distance. Based on previously classified customer support enquiries (tickets), we considered the effectiveness of different algorithms and configurations to automatically identify keywords of interest (such as error phrases, product names and warning messages) in instances where such key phrases are misspelled, copied incorrectly or are otherwise differently formed. An optimal algorithm selection is made based on novel studies of the aforementioned similarity measures on text strings tokenised into characters. Such analysis also allowed for an optimum similarity threshold to be identified for various categories of enquiries, to reduce mismatched strings whilst allowing optimal coverage of the correctly matched key phrases. This led to a 15% improvement in the ratio of false positives to true positive classifications over the existing approach used by a customer support system.
机译:模糊字符串匹配允许比较接近但不完全匹配的字符串,以便从文本正文中进行比较和提取。因此,它们在自动提取和处理文档的系统中很有用。我们总结并比较了用于实现字符串相似性度量的各种现有算法:最长公共子序列(LCS),Dice系数,余弦相似性,Levenshtein距离和Damerau距离。基于先前分类的客户支持查询(票证),我们考虑了在关键字拼写错误,复制不正确或出现错误的情况下,自动识别感兴趣的关键字(例如错误短语,产品名称和警告消息)的不同算法和配置的有效性。否则形成不同的形式。基于对标记为字符的文本字符串的上述相似性度量的新颖研究,做出了最佳算法选择。这种分析还允许针对各种类别的查询确定最佳相似性阈值,以减少不匹配的字符串,同时允许正确匹配的关键短语的最佳覆盖。与客户支持系统所使用的现有方法相比,这使误报与真正肯定分类的比率提高了15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号