首页> 外文会议>Proceedings of the 2006 International Conference on Machine Learning and Cybernetics >IMPROVING FEATURE EXTRACTION IN NAMED ENTITY RECOGNITION BASED ON MAXIMUM ENTROPY MODEL
【24h】

IMPROVING FEATURE EXTRACTION IN NAMED ENTITY RECOGNITION BASED ON MAXIMUM ENTROPY MODEL

机译:基于最大熵模型的命名实体识别中的特征提取

获取原文

摘要

A new method of improving feature extraction for Named Entity Recognition is proposed in this paper. First of all, the context features and the entity features are extracted by the corresponding algorithm. The triggers extracted by Mutual Information, Information Gain, Average Mutual Information etc, are adopted to enhance the context features. And rough set theory is used to extract the entity features. Secondly, word cluster method is presented to improve the approach of expanding features, which make us select features more easily,and overcome the sparse data problem effectively. Finally, all the features are added into the maximum entropy model. The experiments have confirmed that our method is effective. The above method has been used in our word segmenter, which participated in the International SIGHAN-2005 Evaluation,and ranked first in open test in MSR corpus.
机译:提出了一种改进命名实体识别特征提取的新方法。首先,通过相应的算法提取上下文特征和实体特征。通过互信息,信息增益,平均互信息等提取的触发器被用来增强上下文特征。并使用粗糙集理论提取实体特征。其次,提出了词聚类的方法,以改进特征扩展的方法,使我们更容易地选择特征,并有效地克服了稀疏数据的问题。最后,将所有特征添加到最大熵模型中。实验已经证实我们的方法是有效的。我们的分词器使用了上述方法,该分词器参加了SIGHAN-2005国际评估,并在MSR语料库的开放测试中排名第一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号