首页> 外文期刊>Knowledge and information systems >Integrating learned and explicit document features for reputation monitoring in social media
【24h】

Integrating learned and explicit document features for reputation monitoring in social media

机译:在社交媒体中集成了学习和明确的文档功能,以便在社交媒体中的信誉监测

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Currently, monitoring reputation in social media is probably one of the most lucrative applications of information retrieval methods. However, this task poses new challenges due to the dynamicity of contents and the need for early detection of topics that affect the reputations of companies. Addressing this problem with learning mechanisms that are based on training data sets is challenging, given that unseen features play a crucial role. However, learning processes are necessary to capture domain features and dependency phenomena. In this work, based on observational information theory, we define a document representation framework that enables the combination of explicit text features and supervised and unsupervised signals into a single representation model. Our theoretical analysis demonstrates that the observation information quantity (OIQ) generalizes the most popular representation methods, in addition to capturing quantitative values, which is required for integrating signals from learning processes. In other words, the OIQ allows us to give the same treatment to features that are currently managed separately. Empirically, our experiments on the reputation-monitoring scenario demonstrated that adding features progressively from supervised (in particular, Bayesian inference over annotated data) and unsupervised learning methods (in particular, proximity to clusters) increases the similarity estimation performance. This result is verified under various similarity criteria (pointwise mutual information, Jaccard and Lin's distances and the information contrast model). According to our formal analysis, the OIQ is the first representation model that captures the informativeness (specificity) of quantitative features in the document representation.
机译:目前,在社交媒体中监控声誉可能是信息检索方法最有利可图的应用之一。然而,由于内容的动力性和早期检测影响公司声誉的话题,这项任务造成了新的挑战。根据培训数据集的学习机制解决这个问题是具有挑战性的,因为看不见的特征发挥着至关重要的作用。但是,学习过程是捕获域特征和依赖现象的必要条件。在这项工作中,基于观测信息理论,我们定义了一个文档表示框架,使显式文本特征和监督和无监督信号的组合能够成为单个表示模型。我们的理论分析表明,除了捕获定量值之外,观察信息量(OIQ)还概括了最流行的表示方法,这是从学习过程集成信号所必需的。换句话说,OIQ允许我们对当前单独管理的功能提供相同的处理。经验上,我们对声誉监测方案的实验表明,从监督(特别是贝叶斯推断Over Annotated Data)和无监督的学习方法(特别是对集群接近)增加了相似性估计性能的增加。此结果在各种相似标准下验证(省略互信息,jaccard和lin的距离和信息对比模型)。根据我们的正式分析,OIQ是第一个表示模型,其捕获文档表示中的定量特征的信息性(特异性)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号