Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach

机译：使用引导方法从中文文本中提取发音姓名

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Pronunciation-translated names (P-Names) bring more ambiguities to Chinese word segmentation and generic named entity recognition. As there are few annotated resources that can be used to develop a good P-Name extraction system, this paper presents a bootstrapping algorithm, called PN-Finder, to tackle this problem. Starting from a small set of P-Name characters and context cue-words, the algorithm iteratively locates more P-Names from the Internet. The algorithm uses a combination of P-Name and context word probabilities to identify new P-Names. Experiments show that our PN-Finder is able to locate a large number of P-Names (over 100,000) from the Internet with a high recognition accuracy of over 85%. Further tests on the MET-2 test set show that our PN-Finder can achieve a performance of over 90% in F1 value in locating P-Names. The results demonstrate that our PN-Finder is effective.

机译：发音所翻译的名称（p-names）为中文字段和通用命名实体识别带来更多含糊的含量。由于有很少的注释资源可以用于开发出良好的P牌提取系统，因此本文提出了一种被称为PN-Finder的引导算法，以解决这个问题。从一小部分字符和上下文提示单词开始，算法迭代地从Internet定位更多的p姓名。该算法使用p姓名和上下文字概率的组合来识别新的p阶段。实验表明，我们的PN-Finder能够从互联网上找到大量的P牌（超过100,000），高识别精度超过85％。对MET-2测试集的进一步测试表明，我们的PN-Finder可以在定位P牌时在F1值中实现超过90％的性能。结果表明我们的PN-Finder是有效的。

著录项

来源
《International conference on computational linguistics post-conference workshops》|2002年||共6页
会议地点
作者
Jing Xiao; Jimin Liu; Tat-Seng Chua;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. An innovative hybrid approach for extracting named entities from unstructured text data [J] . Thomas Anu, Sangeetha S. Computational Intelligence . 2019,第4期

机译：一种创新的混合方法，用于从非结构化文本数据中提取命名实体
2. A BOOTSTRAPPING METHOD FOR EXTRACTING PARAPHRASES OF EMOTION EXPRESSIONS FROM TEXTS [J] . Fazel Keshtkar, Diana Inkpen Computational Intelligence . 2013,第3期

机译：从文本中提取情绪表达参数的自举方法
3. A Text Mining Approach to Extract Opinions from Unstructured Text [J] . Ananthi Sheshasaayee, R. Jayanthi Indian Journal of Science and Technology . 2015,第36期

机译：一种从非结构化文本中提取意见的文本挖掘方法
4. Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach [C] . Jing Xiao, Jimin Liu, Tat-Seng Chua International conference on computational linguistics post-conference workshops . 2002

机译：使用引导方法从中文文本中提取发音姓名
5. A Bootstrapped Approach to Multilingual Text Stream Parsing [D] . Londhe, Nikhil. 2017

机译：一种引导式的多语言文本流解析
6. Efficiency of health resource utilisation in primary-level maternal and child health hospitals in Shanxi Province China: a bootstrapping data envelopment analysis and truncated regression approach [O] . Tao Zhang, Wei Lu, Hongbing Tao 2020

机译：中国山西省基层妇幼保健医院卫生资源利用效率：自举数据包络分析和截断回归方法
7. Bootstrapped Text-level Named Entity Recognition for Literature [O] . Julian Brooke, Adam Hammond, Timothy Baldwin 2016

机译：引导的文本级别命名实体识别文献
8. Learning to Extract Gene-Protein Names from Weakly-Labeled Text [R] . 2008

机译：学习从弱标记文本中提取基因蛋白质名称

Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach

摘要

著录项

相似文献

相关主题

期刊订阅