【24h】

Adding numbers to text classification

机译:在文本分类中添加数字

获取原文

摘要

Many real-world problems involve a combination of both text- and numerical-valued features. For example, in email classification, it is possible to use instance representations that consider not only the text of each message, but also numerical-valued features such as the length of the message or the time of day at which it was sent. Text-classification methods have thus far not easily incorporated numerical features. In earlier work we described an approach for converting numerical features into bags of tokens so that text classification methods can be applied to numerical classification problems, and showed that the resulting learning methods are competitive with traditional numerical classification methods. In this paper we use this as a way to learn on problems that involve a combination of text and numbers. We show that the results outperform competing methods. Further, we show that selecting a best classification method using text-only features and then adding numerical features to the problem (as might happen if numerical features are only later added to a pre existing text-classification problem) gives performance that rivals a more time-consuming approach of evaluating all classification methods using the full set of both text and numerical features.
机译:许多实际问题涉及文本和数字值特征的组合。例如,在电子邮件分类中,可以使用实例表示形式,该实例表示形式不仅考虑每个消息的文本,而且还考虑数值特征,例如消息的长度或一天中的发送时间。迄今为止,文本分类方法还不容易包含数字特征。在较早的工作中,我们描述了一种将数字特征转换为令牌袋的方法,从而可以将文本分类方法应用于数字分类问题,并表明所得的学习方法与传统的数字分类方法相比具有竞争力。在本文中,我们将其用作学习涉及文本和数字组合的问题的方法。我们表明结果优于竞争方法。此外,我们表明,使用纯文本特征选择最佳分类方法,然后将数字特征添加到问题中(如果仅稍后将数字特征添加到先前存在的文本分类问题中,则可能会发生这种情况)所产生的性能可与之抗衡一种使用全部文本和数字功能来评估所有分类方法的耗时方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号