首页> 中文期刊> 《情报工程》 >基于BLSTM的科技文献术语抽取方法

基于BLSTM的科技文献术语抽取方法

         

摘要

术语抽取是研究科技文献领域的重要技术,为进一步提高科技文献术语抽取的准确率和召回率,本文采用了基于BLSTM(Bidirectional Long Short-Term Memory)的神经网络模型.使用预先训练的词向量字典将中文分词结果映射为向量作为BLSTM模型的输入,使用序列标注的方法将输出分类结果映射为术语的边界进行术语抽取.在自动化技术、计算机技术领域的数据集上,设计实验对比了使用词为特征的BLSTM模型和条件随机场模型的抽取结果.结果表明基于BLSTM模型的科技文献术语抽取得了更优的性能,在中文数据集上精确率最高0.7821,召回率最高0.8020,F1值最高0.7860,在英文数据集上分别达到0.8525,0.8677和0.8543.%Term extraction plays an important role in the field of scientific literature. In order to improve the accuracy and recall of the term extraction, this research designed a neural network model based on BLSTM (Bidirectional Long Short-Term Memory) model. The segmentation results in Chinese were mapped into the vectors via pre-trained word vector dictionary, and the output of classification results were Abstract mapped as the term boundaries via the sequence tagging. The experiment was implemented to compare the BLSTM model with word feature and the conditional random field method in the fields of automation technology and computer technology. The results presented that the BLSTM model obtained the better performance with the highest accuracy 0.7821, the highest recall 0.8020 and the highest F1 value 0.7860 in Chinese dataset. For the English dataset, the highest accuracy, recall and F1 value is 0.8525, 0.8677 and 0.8543, respectively.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号