首页> 外文会议>IEEE International Conference on Data Mining Workshops >ALPOS: A Machine Learning Approach for Analyzing Microblogging Data
【24h】

ALPOS: A Machine Learning Approach for Analyzing Microblogging Data

机译:ALPOS:用于分析微博数据的机器学习方法

获取原文

摘要

With the development of Internet, the increasing volume of information posted on micro-blogging sites like Twitter necessitates the need for efficient information filtering. In conventional text classification problems, it is assumed that the feature vectors extracted from the available documents are sufficient to learn good classifiers. However, this conventional approach is not likely to work for Twitter due to the limited number of characters on each tweet. From a higher level, each tweet can be viewed as an abbreviated abstraction of a long document, and we only have a partial observation of this document. To solve the problem caused by the partial observations, we introduce a novel domain adaption/transfer learning approach called Assisted Learning for Partial Observation (ALPOS). The basic idea is to use a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). In particular, we learn a hidden, higher-level abstraction space, which is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space by using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space for recovery and classification. We compare the performance of this method with existing approaches on synthetic data and the well-known Reuters-21578 dataset. We also present experimental results on twitter classification.
机译:随着互联网的发展,在Twitter等微博站点上发布的信息量越来越大的信息需要需要有效的信息过滤。在传统的文本分类问题中,假设从可用文档中提取的特征向量足以学习良好的分类器。但是,由于每次推文上的字符数有限,这种传统方法不太可能为Twitter工作。从更高的级别,每个推文都可以被视为长文档的缩写抽象,并且我们只对本文档进行了部分观察。为了解决部分观察引起的问题,我们介绍了一种名为辅助学习的新型域适应/转移学习方法,用于部分观察(ALPO)。基本思想是使用大量多标记的示例(源域)来改善部分观察(目标域)的学习。特别是,我们学习隐藏的更高级别的抽象空间,这对于源域中的多标记示例有意义。这是通过使用来自源域的已知标签同时最小化隐藏空间中学到的分类模型中的文档重建误差和错误来完成的。然后将目标空间中的部分观测映射到相同的隐藏空间以进行恢复和分类。我们将这种方法的性能与综合性数据和众所周知的Reuters-21578数据集的现有方法进行比较。我们还呈现了Twitter分类的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号