【24h】

Short Text Classification on Complaint Documents

机译:投诉文件的短文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

Indonesian government has developed a system for citizens to voice their aspirations and complaints, which are then stored in the form of short documents. Unfortunately, the existing system employs human annotators to manually categorize the short documents, which is very expensive and time-consuming. As a result, automatically classifying the short documents into their correct topics will reduce manual works and obviously increase the efficiency of the task itself. In this paper, we propose several approaches to automatically classify these short documents using various features, such as unigrams, bigrams, and their combination. Moreover, we also demonstrate the use of information gain and Latent Dirichlet Allocation (LDA) for selecting discriminative features.
机译:印度尼西亚政府已经开发出一种系统,使公民能够表达自己的愿望和投诉,然后以简短文件的形式存储。不幸的是,现有系统使用人工注释者来对短文档进行手动分类,这是非常昂贵和费时的。结果,将短文档自动分类为正确的主题将减少手工工作,并明显提高任务本身的效率。在本文中,我们提出了几种方法,可以使用各种特征(例如,字母组合词,双字母组及其组合)对这些短文档进行自动分类。此外,我们还演示了使用信息增益和潜在Dirichlet分配(LDA)来选择区分特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号