首页> 中文期刊>现代图书情报技术 >基于串频统计的汉语和孟加拉语专有名萌识别

基于串频统计的汉语和孟加拉语专有名萌识别

     

摘要

This paper implements String Frequency Statistics Algorithm proposed by Nagao to build Proper Noun Recognition (PNR) system for Chinese and Bengali languages. First, n - grams are extracted from untagged input corpus, then they are filtered to get rid of redundant sub - strings, using SSR algorithm. Finally, this multilingual PNR system assigns each n - gram a probability of being a proper noun based on the information of their neighboring words and outputs results according to their probability score. The test results show that this system can effectively recognize name of people, places, organizations or institutions from the input text.%基于Nagao串频统计算法实现汉语和孟加拉语专有名词的识别。提取未经过词性标注的中文和孟加拉语语料中的n元串,使用改进的SSR算法过滤多余子串,利用字串的相邻字信息计算所有n元串成为专有名词的概率,并据此筛选专有名词。最后,实现基于串频统计的跨语言专有名词识别系统。实验表明,系统能够从输入的生语料中有效地识别出人名、地名、团体机构名等。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号