...
首页> 外文期刊>Language Resources and Evaluation >Text Categorization from category name in an industry-motivated scenario
【24h】

Text Categorization from category name in an industry-motivated scenario

机译:在行业激励的情况下,根据类别名称进行文本分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this work we suggest a novel Text Categorization (TC) scenario, motivated by an ad-hoc industrial need to assign documents to a set of predefined categories, while labeled training data for the categories is not available. The scenario is applicable in many industrial settings and is interesting from the academic perspective. We present a new dataset geared for the main characteristics of the scenario, and utilize it to investigate the name-based TC approach, which uses the category names as its only input and does not require training data. We evaluate and analyze the performance of state-of-the-art methods for this dataset to identify the shortcomings of these methods for our scenario, and suggest ways for overcoming these shortcomings. We utilize statistical correlation measured over a target corpus for improving the state-of-the-art, and offer a different classification scheme based on the characteristics of the setting. We evaluate our improvements and adaptations and show superior performance of our suggested method.
机译:在这项工作中,我们提出了一种新颖的文本分类(TC)方案,该方案是出于临时工业需求将文档分配给一组预定义类别而创建的,而没有针对这些类别的带标签的培训数据。该方案适用于许多工业环境,从学术角度来看很有趣。我们提供了一个适合该场景主要特征的新数据集,并利用它来研究基于名称的TC方法,该方法将类别名称用作其唯一输入,而无需训练数据。我们评估和分析此数据集的最新方法的性能,以识别我们的方案中这些方法的缺点,并提出克服这些缺点的方法。我们利用在目标语料库上测得的统计相关性来改进最新技术,并根据设置的特征提供不同的分类方案。我们评估我们的改进和改编,并展示我们建议方法的卓越性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号