首页> 外文会议>Chinese lexical semantics workshop >A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN
【24h】

A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN

机译:基于TF-IDF和CNN的汉语词语义关系分类方法。

获取原文

摘要

The classification of semantic relations between words is an important part of semantic analysis in natural language research. The automatic achievement of this classification is of significance to construction of the Knowledge Graph and Information Retrieval. In NLPCC2017 shared task on Chinese Word Semantic Relations Classification, the semantic relations have been classified into four categories: synonym, antonym, hyponymy and mer-onym. This paper presents a classification method for Chinese word semantic relations based on TF-IDF and CNN, and uses words' literal and semantic features. Four new literal features are proposed including whether a word is part of another word and the ratio of their common substring. The extraction of semantic features is a four-step process- training a vector model of words on BaiduBaike Corpus, selecting a set of words most related to a given word from BaiduBaike based on TF-IDF, constructing a vector matrix for the set of related words, and using CNN to get the semantic features of the given word from the vector matrix. The experiment on the NLPCC2017 dataset demonstrates that the F_1-score is up to 83.91%, which proves effective to eliminate the influence of the OOV words.
机译:词之间语义关系的分类是自然语言研究中语义分析的重要组成部分。这种分类的自动实现对知识图谱和信息检索的构建具有重要意义。在NLPCC2017的“汉语单词语义关系分类”共享任务中,语义关系已分为四类:同义词,反义词,下位词和人名。本文提出了一种基于TF-IDF和CNN的汉语单词语义关系分类方法,并利用单词的字面意义和语义特征。提出了四个新的文字特征,包括一个单词是否是另一个单词的一部分以及它们共同子串的比率。语义特征的提取是一个四步过程,即在BaiBaBaike语料库上训练单词矢量模型,基于TF-IDF从BaiBaBaike中选择与给定单词最相关的单词集,为相关集构建向量矩阵单词,然后使用CNN从向量矩阵中获取给定单词的语义特征。在NLPCC2017数据集上的实验表明F_1分数高达83.91%,这被证明可以有效消除OOV单词的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号