首页> 外文会议>International conference on computational linguistics post-conference workshops >Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach
【24h】

Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach

机译:使用引导方法从中文文本中提取发音姓名

获取原文

摘要

Pronunciation-translated names (P-Names) bring more ambiguities to Chinese word segmentation and generic named entity recognition. As there are few annotated resources that can be used to develop a good P-Name extraction system, this paper presents a bootstrapping algorithm, called PN-Finder, to tackle this problem. Starting from a small set of P-Name characters and context cue-words, the algorithm iteratively locates more P-Names from the Internet. The algorithm uses a combination of P-Name and context word probabilities to identify new P-Names. Experiments show that our PN-Finder is able to locate a large number of P-Names (over 100,000) from the Internet with a high recognition accuracy of over 85%. Further tests on the MET-2 test set show that our PN-Finder can achieve a performance of over 90% in F1 value in locating P-Names. The results demonstrate that our PN-Finder is effective.
机译:发音所翻译的名称(p-names)为中文字段和通用命名实体识别带来更多含糊的含量。由于有很少的注释资源可以用于开发出良好的P牌提取系统,因此本文提出了一种被称为PN-Finder的引导算法,以解决这个问题。从一小部分字符和上下文提示单词开始,算法迭代地从Internet定位更多的p姓名。该算法使用p姓名和上下文字概率的组合来识别新的p阶段。实验表明,我们的PN-Finder能够从互联网上找到大量的P牌(超过100,000),高识别精度超过85%。对MET-2测试集的进一步测试表明,我们的PN-Finder可以在定位P牌时在F1值中实现超过90%的性能。结果表明我们的PN-Finder是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号