首页> 外文会议>Language, ontology, terminology and knowledge structures workshop >Fine-grained domain classification of text using TERMIUM Plus
【24h】

Fine-grained domain classification of text using TERMIUM Plus

机译:使用Termium Plus的细粒度域分类文本

获取原文

摘要

In this article, we present the use of a term bank for text classification purposes. We developed a supervised text classification approach which takes advantage of the domain-based structure of a term bank, namely TERMIUM Plus, as well as its bilingual content. The goal of the text classification task is to correctly identify the appropriate fine-grained domains of short segments of text in both French and English. We developed a vector space model for this task, which we refer to as the DCVSM (domain classification vector space model). In order to train and evaluate the DCVSM, we generated two new datasets from the open data contained in TERMIUM Plus. Results on these datasets show that the DCVSM compares favourably to five other supervised classification algorithms tested, achieving the highest micro-averaged recall (R@ 1).
机译:在本文中,我们介绍了术语银行进行文本分类目的。我们开发了一种监督的文本分类方法,利用了术语银行的基于域的结构,即Termium Plus以及其双语内容。文本分类任务的目标是正确地识别法语和英语中文本短片的适当细粒度域。我们为此任务开发了一个矢量空间模型,我们将其称为DCVSM(域分类矢量空间模型)。为了培训和评估DCVSM,我们从终端加上的打开数据中生成了两个新数据集。结果在这些数据集上表明,DCVSM对测试的五个其他监督分类算法进行了比较,实现最高的微平均召回(R @ 1)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号