首页> 外文会议>OnTheMove Federated International Workshops >Structuring the Blogosphere on News from Traditional Media
【24h】

Structuring the Blogosphere on News from Traditional Media

机译:从传统媒体构建博​​客圈

获取原文

摘要

News and social media are emerging as a dominant source of information for numerous applications. However, their vast unstructured content present challenges to efficient extraction of such information. In this paper, we present the SYNC3 system that aims to intelligently structure content from both traditional news media and the blogosphere. To achieve this goal, SYNC3 incorporates innovative algorithms that first model news media content statistically, based on fine clustering of articles into so-called "news events". Such models are then adapted and applied to the blogosphere domain, allowing its content to map to the traditional news domain. In this paper an unsupervised approach to do-main adaptation is presented, which exploits external knowledge sources in order to port a classification model into a new thematic domain. Our approach extracts a new feature set from documents of the target domain, and tries to align the new features to the original ones, by exploiting text relatedness from external knowledge sources, such as WordNet. The approach has been evaluated on the task of document classification, involving the classification of newsgroup postings into 20 news groups.
机译:新闻和社交媒体正在成为众多应用的主要信息来源。然而,他们庞大的非结构化内容存在有效提取此类信息的挑战。在本文中,我们展示了Sync3系统,旨在智能地结构于传统新闻媒体和博客圈的内容。为了实现这一目标,Sync3包括创新算法,该算法在统计上进行统计上的新闻媒体内容,基于物品的精细聚类进入所谓的“新闻事件”。然后,这种模型被调整并应用于博主域,允许其内容映射到传统新闻领域。在本文中,提出了一种无监督的DO-MAIN适应方法,它利用外部知识来源来将分类模型移植到新的专题域中。我们的方法将从目标域的文档提取一个新功能,并尝试通过从外部知识源(如Wordnet)的文本相关性来将新功能与原始功能对齐。该方法已在文件分类的任务中进行了评估,涉及新闻组帖子的分类为20个新闻组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号