首页> 外文期刊>Knowledge-Based Systems >Efficient text chunking using linear kernel with masked method
【24h】

Efficient text chunking using linear kernel with masked method

机译:使用带屏蔽方法的线性内核进行有效的文本分块

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we proposed an efficient and accurate text chunking system using linear SVM kernel and a new technique called masked method. Previous researches indicated that systems combination or external parsers can enhance the chunking performance. However, the cost of constructing multi-classifiers is even higher than developing a single processor. Moreover, the use of external resources will complicate the original tagging process. To remedy these problems, we employ richer features and propose a masked-based method to solve unknown word problem to enhance system performance. In this way, no external resources or complex heuristics are required for the chunking system. The experiments show that when training with the CoNLL-2000 chunking dataset, our system achieves 94.12 in F_((β)) rate with linear. Furthermore, our chunker is quite efficient since it adopts a linear kernel SVM. The turn-around tagging time on CoN-LL-2000 testing data is less than 50 s which is about 115 times than polynomial kernel SVM.
机译:在本文中,我们提出了一种使用线性SVM内核的高效,准确的文本分块系统和一种称为“屏蔽方法”的新技术。先前的研究表明,系统组合或外部解析器可以增强分块性能。但是,构造多分类器的成本甚至比开发单个处理器还要高。此外,外部资源的使用将使原始标记过程复杂化。为了解决这些问题,我们采用了更丰富的功能,并提出了一种基于掩码的方法来解决未知单词的问题,以提高系统性能。这样,分块系统不需要外部资源或复杂的启发式方法。实验表明,使用CoNLL-2000分块数据集进行训练时,我们的系统的F _((β))率线性达到94.12。此外,我们的分块器采用线性内核SVM,因此效率很高。 CoN-LL-2000测试数据的周转标记时间少于50 s,约为多项式内核SVM的115倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号