首页> 中文期刊> 《计算机工程与科学》 >谓词自动识别中的特征选择度量研究

谓词自动识别中的特征选择度量研究

         

摘要

Predicate Identification is one of the important research topics in shallow parsing. In this paper, a predicate identification method is proposed based on the support vector machine classification algorithm. Our focus is on the feature selection method with information gain and the metric method of feature words with TongYiCiCiLin information gain method selects the features that have a greater impact to classification model,which can reduce the dimensions of feature vector. TongYiCiCiLin maps the feature words into deep-seated semantic concept,enhances the representation ability of features, and emphasizes the degree of correlation between the features and the model. Experiments on a relatively small corpus show that the best F-Score of predicate identification reaches 84. 0% and increases by 4. 6% compared with the situation without dealing with the data. The experimental results show that the new method of the selection method of feature words and the representation of feature attribute are effective for predicate identification and can greatly improve the performance of classification.%谓词的自动识别是浅层句法分析的重要内容.本文提出了基于支持向量机分类算法的谓词自动识别方法,重点描述了在特征构建过程中基于信息增益的特征筛选方法与基于同义词词林的特征词度量方法.信息增益方法选取对分类影响较大的特征,降低了特征维度;同义词词林的度量方法将特征词映射为深层次的语义概念,增强了特征的表达能力,强调了属性特征与模型的相关度.在小规模语料库上的实验表明,谓词识别的最好F-Score达到了84.0%,相较于对数据无任何处理的情况F-Score提高了4.6%.结果表明,这种新的特征筛选与特征度量方法在谓词识别中十分有效,可以极大提高分类器的性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号