首页> 外文期刊>International journal of web information systems >Mining and visualising information from RSS feeds: a case study
【24h】

Mining and visualising information from RSS feeds: a case study

机译:从RSS提要中挖掘和可视化信息:一个案例研究

获取原文
获取原文并翻译 | 示例
       

摘要

Purpose - Recent years have seen "really simple syndication" or "rich site summary"(RSS) syndication of frequently updated content become ubiquitous across the internet. RSS's XML-based format allows these data to be stored in a semi-structured format but, despite the presence of online aggregators and readers, and the related work in clustering feeds and mining subjects by keywords, much potentially useful information present in RSS may remain undiscovered. This paper aims to address this issue in an experimental setting. Design/methodology/approach - This paper presents two distinct technologies which employ the semi-structured nature of RSS content to allow users to mine information directly from raw RSS feeds: occurrence mining counts occurrences of text strings in feeds, whilst value mining mines structured ticker tape numeric data. It describes both technologies and their implementation in an experiment, where 35 students mined small numbers of RSS feeds and visualised the data mined. Findings - This paper analyses the results of the experiment and cites examples of data mined and visualisations produced. The subject matter of data mined is also explored and potential applications of the technologies are considered. Research limitations/implications - The mining technologies proposed in this paper have been developed to mine textual and numeric data directly from feeds, but can be extended to mine other data types present in RSS and to include other variants like Atom. Originality/value - These technologies are seen to be applicable to data mining, the role of data and visualisations in social data analysis, issue tracking in news mining and time series analysis.
机译:目的-近年来,频繁更新的内容的“真正简单的联合组织”或“ RSS”联合组织在互联网上无处不在。 RSS的基于XML的格式允许将这些数据以半结构化格式存储,但是,尽管存在在线聚合器和阅读器,并且在通过种子对提要和挖掘主题进行聚类的相关工作中,RSS中仍然存在许多潜在有用的信息未发现。本文旨在在实验环境中解决此问题。设计/方法/方法-本文介绍了两种独特的技术,它们利用RSS内容的半结构化性质来允许用户直接从原始RSS提要中挖掘信息:发生挖掘对提要中文本字符串的出现进行计数,而价值挖掘则是对结构化行情进行挖掘磁带数字数据。它描述了这两种技术及其在实验中的实现,其中35名学生挖掘了少量的RSS feed,并可视化了所挖掘的数据。调查结果-本文分析了实验结果,并列举了数据挖掘和可视化示例。还探讨了数据挖掘的主题,并考虑了该技术的潜在应用。研究的局限性/意义-本文提出的挖掘技术已经开发出来,可以直接从提要中挖掘文本和数字数据,但是可以扩展为挖掘RSS中存在的其他数据类型,并包括其他变体,例如Atom。原创性/价值-这些技术被认为适用于数据挖掘,数据和可视化在社会数据分析中的作用,新闻挖掘中的问题跟踪和时间序列分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号