首页> 外文期刊>Pattern Analysis and Applications >A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth
【24h】

A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth

机译:基于改进的FP增长的微博短文本中文未知词识别方法

获取原文
获取原文并翻译 | 示例
       

摘要

Unknown word recognition technology is of great significance to improve the precision of text segmentation and syntax analysis. Social network has become an important platform for sharing, disseminating, and acquiring information. Unknown word recognition based on micro-blog short text has become a research hot spot, while the micro-blog text contains a large number of nonstandard terms and network buzzwords, which has increased the difficulty of unknown word recognition. This paper proposes a Chinese unknown word recognition method for micro-blog short text based on improved FP-growth (POS-FP). Firstly, the POS-FP algorithm is used to get frequent itemsets from micro-blog, and the N-grams model is used to filter out unknown words from frequent itemsets. Secondly, the improved mutual information and left-right information entropy are used to verify the internal features of candidate unknown words. Then, context-dependent and open-source methods are used for external verification of candidate unknown words. Compared with traditional methods, this algorithm improves the recognition rate of unknown words in micro-blog short texts.
机译:未知词识别技术对提高文本分割和语法分析的准确性具有重要意义。社交网络已成为共享,传播和获取信息的重要平台。基于微博客短文本的未知词识别已成为研究热点,而微博客文本中包含大量非标准术语和网络流行词,增加了未知词识别的难度。提出了一种基于改进FP增长的微博客短文本中文未知词识别方法。首先,使用POS-FP算法从微博中获取频繁项集,并使用N-grams模型从频繁项集中过滤出未知单词。其次,利用改进后的互信息和左右信息熵来验证候选未知词的内部特征。然后,使用上下文相关的开源方法对候选未知单词进行外部验证。与传统方法相比,该算法提高了微博短文本中未知词的识别率。

著录项

  • 来源
    《Pattern Analysis and Applications》 |2020年第2期|1011-1020|共10页
  • 作者

  • 作者单位

    Beijing Univ Technol Coll Appl Sci Beijing Peoples R China|Taiji Comp Co Ltd Beijing Peoples R China;

    Beijing Univ Technol Coll Appl Sci Beijing Peoples R China|Beijing Univ Technol Beijing Inst Sci & Engn Comp Beijing Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Unknown word recognition; FP-growth algorithm; Mutual information; Information entropy;

    机译:未知单词识别;FP-增长算法;相互信息;信息熵;
  • 入库时间 2022-08-18 05:21:23

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号