首页> 外文期刊>Wireless communications & mobile computing >Chinese Personal Name Disambiguation Based on Clustering
【24h】

Chinese Personal Name Disambiguation Based on Clustering

机译:基于聚类的中国个人名称消歧

获取原文
       

摘要

Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf-idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.
机译:个人名称歧义是自然语言处理中的一个重要问题,这是自动信息处理中许多任务的基础。本研究探讨了基于聚类技术的中国个人名称消歧。应用预处理以在开始时将原始语料库转换为标准化格式。然后,通过词法分析完成中文字段,语音分割标记和命名实体识别。此外,我们努力提取能够更好消除中国个人名称的功能。创建了一些识别目标个人名称的规则以提高实验效果。另外,实现了特征权重的许多计算方法,例如Bool重量,绝对频率重量,TF-IDF重量和熵权。对于聚类算法,通过与其他聚类方法进行比较来选择附加分层群集。最后,采用标签方法来引入可以表示每个群集的特征词。实验实现了五组中国个人名称的良好结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号