【24h】

Constructing Multiple Domain Taxonomy for Text Processing Tasks

机译:为文本处理任务构造多域分类法

获取原文

摘要

In recent years large volumes of short text data can be easily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promising performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such issues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in combination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classification, where the accuracy can be improved by up to 11%.
机译:近年来,可以轻松地从微博和产品评论网站等平台收集大量的短文本数据。通常,所获得的短文本数据包含多个域,这在有效的多域文本处理中提出了许多挑战,因为在文本数据中区分多个域具有挑战性。多域分类法(MDT)的概念在处理多域文本数据中显示出令人鼓舞的性能。但是,MDT必须手动构建,这需要大量有关相关领域的专业知识,并且非常耗时。为了解决这些问题,在本文中,我们引入了一种半自动方法来构造仅需要少量手动输入的MDT,并结合了一种无监督方法,该方法可以根据从未标记数据中学习的语义关系对多域概念进行排名。我们表明,使用我们的半自动方法迭代构造的MDT可以比现有的域分类方法实现更高的准确性,该方法可以将准确性提高多达11%。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号