首页> 外文会议>International Conference on Big Data, Small Data, Linked Data and Open Data >Industry Experience: Chinese Names Duplicate Records Detection
【24h】

Industry Experience: Chinese Names Duplicate Records Detection

机译:行业经验:中文名称重复记录检测

获取原文

摘要

The Soundex method is the preferred method for duplicate detection process on Malaysian Chinese names. The names are written in English text, but are phonetically translated from various Chinese dialects. When using the Russell Soundex method, it is found that the number of duplicates is high and the number of false positives is also high. The adaptive nature of Soundex method provides an avenue to optimize it for foreign language names, such as Chinese names. Through a series of tests, this study has optimized the Soundex codes for general Malaysian Chinese names. The test results have shown that a few short Chinese surnames contribute to false positives.
机译:Soundex方法是马来西亚汉语名称上重复检测过程的首选方法。这些名称是用英文文本编写的,但从各种中文方言进行语音翻译。使用russell soundex方法时,发现重复数量高,误报的数量也很高。 Soundex方法的自适应性质为外语名称提供了优化的大道,例如中文名称。通过一系列测试,本研究优化了马来西亚普通中文名称的Soundex代码。测试结果表明,一些短的中国姓氏有助于误报。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号