基于用字共现频率统计的外国译名自动识别

陈阳; 赵跃华; 程显毅

首页> 中文期刊> 《计算机工程与设计》 >基于用字共现频率统计的外国译名自动识别

基于用字共现频率统计的外国译名自动识别

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

为了减少分词的负面效果,提出了基于用字共现频率统计的外国译名自动识别方法.对译名的用字特征进行了统计,提出译名共现字串的概念,并由译名用字表与汉语常用字表得到了非译名用字表.在上述工作的基础上定义了译名的边界,在边界定义的基础上设计了一种对分词错误的调整方法.对开放语料的测试结果表明,与最大词频分词算法相比,该算法在译名识别中的准确率、召回率、F值均有所提高.%To reduce the negative impact of segmentation, an automatic recognition algorithm for transliterated name recogni-tion based on co-occurrence frequency statistics of words is presented. Firstly, the statistical features of word of transliterated name are summarized and then the concept of co-occurrence string is proposed. The character table of non-translated name is obtained through the character table of transliterated name and the common Chinese character table. Secondly, the boundary of transliterated name is defined based on these above. Finally, an adjustment method is designed to deal with errors of segmenta-tion based on the definition of boundary. The result of experiment is satisfied in comparison with maximum word frequency seg-mentation algorithm. The recall rate, precision rate and F values of identification are enhanced.

著录项

来源
《计算机工程与设计》 |2012年第1期|362-366|共5页
作者
陈阳; 赵跃华; 程显毅;
展开▼
作者单位

江苏大学计算机科学与通信工程学院;

江苏镇江212000;

江苏大学计算机科学与通信工程学院;

江苏镇江212000;

南通大学计算机科学与技术学院;

江苏南通226019;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
外国译名; 分词; 共现字串; 频率统计; 译名边界; 自然语言处理;

相似文献

中文文献
外文文献
专利

1. 日本元素译名对中国元素译名用字的影响探微 [J] . 牛振 . 中国文字研究 . 2020,第002期
2. 初中古诗文常用字词频率统计及教学内容的确定 [J] . 郑艳 ,葛晓瑜 . 大连大学学报 . 2018,第005期
3. 基于频率共现熵的跨语言网页自动分类研究 [J] . 柯丽 ,王明文 ,何世柱 . 江西师范大学学报（自然科学版） . 2011,第003期
4. 译名用字在新疆普通话中的音变 [J] . 盛桂琴 . 新疆大学学报（哲学·人文社会科学版） . 2004,第003期
5. 译名用字统一难 [J] . 丁乙 . 语文建设 . 1963,第006期
6. 华语教材用字的地域分布与频率统计——新加坡、马来西亚、台湾、中国大陆对比 [C] . 王惠 ,余桂林 . 首届全国教育教材语言专题学术研讨会 . 2006
7. 基于谐振频率自动识别的超声波电源研究 [A] . 李祖胜 . 2011

基于用字共现频率统计的外国译名自动识别

摘要

著录项

相似文献

相关主题

期刊订阅