...
首页> 外文期刊>PLoS One >Microblog topic identification using Linked Open Data
【24h】

Microblog topic identification using Linked Open Data

机译:使用链接开放数据的MicroBlog主题标识

获取原文

摘要

Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of the information related to subjects across several posts. Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. Such topics are used in tasks like classification and recommendations. The interpretation of topics is considered a separate task in such methods, albeit they are becoming increasingly human-interpretable. This work proposes an approach for identifying machine-interpretable topics of collective interest. We define topics as a set of related elements that are associated by having posted in the same contexts. To represent topics, we introduce an ontology specified according to the W3C recommended standards. The elements of the topics are identified via linking entities to resources published on Linked Open Data (LOD). Such representation enables processing topics to provide insights that go beyond what is explicitly expressed in the microposts. The feasibility of the proposed approach is examined by generating topics from more than one million tweets collected from Twitter during various events. The utility of these topics is demonstrated with a variety of topic-related tasks along with a comparison of the effort required to perform the same tasks with words-list-based representations. Manual evaluation of randomly selected 36 sets of topics yielded 81.0% and 93.3% for the precision and F1 scores respectively.
机译:很多有价值的信息嵌入在社交媒体帖子(MicroPosts)中,这些信息由各种各样的关于别人感兴趣的人的贡献。由于跨越多个帖子的受试者相关的信息的职位和分布性质,此信息的自动利用是具有挑战性的。已经提出了许多方法来检测来自微孔集合的主题,其中主题由单词,短语或单词嵌入的术语列表表示。此类主题用于类别和建议等任务。主题的解释被认为是这种方法的单独任务,尽管他们正在变得越来越人的可解释。这项工作提出了一种识别集体利益的机器可解释主题的方法。我们将主题定义为通过在同一上下文中发布相关联的一组相关元素。要代表主题,我们介绍根据W3C推荐标准指定的本体。通过将实体链接到已关联的开放数据(LOD)上发布的资源来识别主题的元素。这种代表性使得能够处理主题以提供超出在微源中明确表达的内容的见解。通过在各种事件中从Twitter收集的超过一百万推文产生主题来检查所提出的方法的可行性。这些主题的实用程序与各种与主题相关的任务一起展示了与基于单词列表的表示执行相同任务所需的努力的比较。随机选择的36套主题的手动评估分别产生了81.0%和93.3%,分别为93.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号