Chinese Personal Name Disambiguation Based on Clustering

Chao Fan; Yu Li

首页> 外文期刊>Wireless communications & mobile computing >Chinese Personal Name Disambiguation Based on Clustering

【24h】

Chinese Personal Name Disambiguation Based on Clustering

机译：基于聚类的中国个人名称消歧

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf-idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.

机译：个人名称歧义是自然语言处理中的一个重要问题，这是自动信息处理中许多任务的基础。本研究探讨了基于聚类技术的中国个人名称消歧。应用预处理以在开始时将原始语料库转换为标准化格式。然后，通过词法分析完成中文字段，语音分割标记和命名实体识别。此外，我们努力提取能够更好消除中国个人名称的功能。创建了一些识别目标个人名称的规则以提高实验效果。另外，实现了特征权重的许多计算方法，例如Bool重量，绝对频率重量，TF-IDF重量和熵权。对于聚类算法，通过与其他聚类方法进行比较来选择附加分层群集。最后，采用标签方法来引入可以表示每个群集的特征词。实验实现了五组中国个人名称的良好结果。

著录项

来源
《Wireless communications & mobile computing》 |2021年第a期|共7页
作者
Chao Fan; Yu Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线通信;
关键词

相似文献

外文文献
中文文献
专利

1. Chinese Person Name Disambiguation Based on Two-Stage Clustering [J] . Jie Zhou, Bicheng Li, Yongwang Tang Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2016,第5a118期

机译：基于两阶段聚类的中文姓名歧义消除
2. A Chinese expert disambiguation method based on semi-supervised graph clustering [J] . Jin Jiang, Xin Yan, Zhengtao Yu, International journal of machine learning and cybernetics . 2015,第2期

机译：基于半监督图聚类的中文专家消歧方法
3. Chinese multi-document personal name disambiguation [J] . Wang Houfeng, Mei Zheng High Technology Letters . 2005,第3期

机译：中文多文档个人名称消除歧义
4. Personal Name Disambiguation in Web Search Results Based on a Semi-supervised Clustering Approach [C] . Kazunari Sugiyama, Manabu Okumura International Conference on Asian Digital Libraries(ICADL 2007); 20071210-13; Hanoi(VN) . 2007

机译：基于半监督聚类方法的网络搜索结果中的人名歧义消除
5. Automatic disambiguation of Chinese modal expressions - A supervised machine learning experiment. [D] . Chi, Ting. 2013

机译：汉语模态表达的自动消歧-有监督的机器学习实验。
6. Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts [O] . Weisi Duan, Min Song, Alexander Yates 2009

机译：快速最大边距聚类用于生物医学文本中无监督的词义消歧
7. Chinese Personal Name Disambiguation Based on Clustering [O] . Chao Fan, Yu Li 2021

机译：基于聚类的中国个人名称消歧

Chinese Personal Name Disambiguation Based on Clustering

摘要

著录项

相似文献

相关主题

期刊订阅