面向网络语言基于微博语料的新词发现方法

雷一鸣; 刘勇; 霍华

首页> 中文期刊> 《计算机工程与设计》 >面向网络语言基于微博语料的新词发现方法

面向网络语言基于微博语料的新词发现方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

为对微博语料中的中文新词进行有效的识别发现,针对微博语料的文本特性,提出一种基于词语互信息模型和外部统计量的新词发现方法.采用互信息统计模型基于候选词内部最小搭配单元向右邻元扩展统计的方法,建立候选词集;针对统计特性、语料特征,进行低频筛选,引入外部统计量的概念进行过滤.该统计方法解决了基于互信息统计模型用于新词发现时只能统计两组成元素的局限性,规避了影响新词发现研究准确性能的N元重叠问题,过滤方法对于包含大量短语句的微博语料用着良好作用,通过实例与对比验证了该方法的有效性.%To effectively identify and discover the Chinese new words in the microblog corpus, according to the text features of the corpus on microblog, a new word discovery method combining mutual information and external statistics was proposed.A new word candidate set was established by adopting mutual information statistical model based on the minimum combination and extending to the right.Based on the statistics and corpus features, the result was obtained according to the threshold value of the low-frequency and the filter method of external statistics.This statistical method solves the limitation of mutual information model that it only based on two elements and avoids the problem of N-gram overlap.Filtering methods is necessary for microblog corpus containing a large number of short phrase sentences.The effectiveness of the research method is verified through example and contrast test.

著录项

来源
《计算机工程与设计》 |2017年第3期|789-794|共6页
作者
雷一鸣; 刘勇; 霍华;
展开▼
作者单位

河南科技大学信息工程学院;

河南洛阳 471023;

河南科技大学信息工程学院;

河南洛阳 471023;

河南科技大学信息工程学院;

河南洛阳 471023;

展开▼
原文格式 PDF
正文语种 chi
中图分类文字信息处理;
关键词
新词发现; 微博语料; 互信息; 词内部耦合度; 外部统计量;

相似文献

中文文献
外文文献
专利

1. 基于古汉语语料的新词发现方法 [J] . LIU Yutong ,WU Bin ,XIE Tao . 中文信息学报 . 2019,第001期
2. 基于改进互信息和邻接熵的微博新词发现方法 [J] . 夭荣朋 ,许国艳 ,宋健 . 计算机应用 . 2016,第010期
3. 基于微博内容的新词发现方法 [J] . 霍帅 ,张敏 ,刘奕群 . 模式识别与人工智能 . 2014,第002期
4. 基于微博语料库的网络新词语义变异现象研究 [J] . 沈颖 . 长春理工大学学报（社会科学版） . 2012,第004期
5. 博客语料的新词发现方法 [J] . 黄轩 ,李熔烽 . 现代电子技术 . 2013,第002期
6. 基于古文语料的新词发现方法 [C] . Yutong Liu ,刘昱彤 ,Bin Wu . 第十七届全国计算语言学学术会议暨第六届基于自然标注大数据的自然语言处理国际学术研讨会（CCL 2018） . 2018
7. 面向微博的新词发现和情感词典构建方法研究 [A] . 刘伟童 . 2019

面向网络语言基于微博语料的新词发现方法

摘要

著录项

相似文献

相关主题

期刊订阅