首页> 外文期刊>Journal of information and computational science >Domain-specific Chinese Term Extraction via Word Segmentation Optimization
【24h】

Domain-specific Chinese Term Extraction via Word Segmentation Optimization

机译:通过分词优化提取特定领域的中文术语

获取原文
获取原文并翻译 | 示例
       

摘要

Automatic term extraction has been a research hotspot in the field of Chinese information processing. However, the existing Chinese word segmentation systems such as ICTCLAS or LTP always get a fine-grained word segmentation result which destroys the integrity of terms in the processing of domain-specific texts, so that we cannot easily get the right terms. In this work, we try to solve the problem from the angle of the optimization of Chinese word segmentation. Two algorithms of word segmentation optimization based on Frequency and word segmentation optimization based on Pointwise Mutual Information are proposed. A new term extraction method "CI-Value" to extract domain specific terms from web pages based on the proposed two word segmentation optimization algorithms is presented to solve the problem. Several groups of experiments in "SPORT", "FOOD" and "IT" domains are conducted to verify the approach's effects. The results of the experiments show that our approach has more efficiency in Chinese term extraction and solves the optimization problem of word segmentation for Chinese term extraction in open-domain such as web pages from Internet well.
机译:自动术语提取一直是中文信息处理领域的研究热点。但是,现有的中文分词系统(如ICTCLAS或LTP)总是得到细粒度的分词结果,这破坏了特定领域文本的处理过程中术语的完整性,因此我们无法轻易地获得正确的术语。在这项工作中,我们尝试从优化中文分词的角度解决问题。提出了两种基于频率的分词优化算法和基于逐点互信息的分词优化算法。提出了一种基于提出的两种分词优化算法的术语提取方法“ CI-Value”,用于从网页中提取领域特定的术语。在“运动”,“食品”和“ IT”领域中进行了几组实验,以验证该方法的效果。实验结果表明,该方法在中文词提取中具有更高的效率,很好地解决了开放域中文词提取中分词的优化问题,例如互联网上的网页。

著录项

  • 来源
    《Journal of information and computational science》 |2015年第17期|6477-6490|共14页
  • 作者单位

    School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China ,Department of Computer, Beijing University of Civil Engineering and Architecture Beijing 100044, China;

    Advanced Analytics Institute, University of Technology, Sydney, Australia;

    School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China;

    School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China;

    School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Term; Term Extraction; Word Segmentation; Pointwise Mutual Information;

    机译:术语;术语提取;分词;逐点相互信息;
  • 入库时间 2022-08-18 02:11:25

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号