...
首页> 外文期刊>Knowledge and information systems >Dynamic sampling of text streams and its application in text analysis
【24h】

Dynamic sampling of text streams and its application in text analysis

机译:文本流的动态采样及其在文本分析中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

A large number of texts are rapidly generated as streaming data in social media. Since it is difficult to process such text streams with limited memory in real time, researchers are resorting to text stream compression and sampling to obtain a small portion of valuable information from the streams. In this study, we investigate the crucial question of how to use less memory space to store more valuable texts to maintain the global information of the stream. First, we propose a text stream sampling framework based on compressed sensing theory, which can sample a text stream with a lightweight framework to reduce the space consumption while still retaining the most valuable texts. We then develop a query word-based retrieval task as well as a topic detection and evolution analysis task on the sample stream to evaluate the performance of the framework in retaining valuable information. The framework is evaluated from several aspects using two representative datasets of social media, including compression ratio, runtime, information reserved rate, and efficiency of the text analysis tasks. Experimental results demonstrate that the proposed framework outperforms baseline methods and is able to complete the text analysis tasks with promising results.
机译:大量文本被迅速生成作为社交媒体中的流数据。由于实时难以处理具有有限内存的文本流,因此研究人员正在诉诸文本流压缩和采样,以从流中获取一小部分有价值信息。在本研究中,我们调查如何使用更少的内存空间来存储更多有价值文本以维护流的全球信息的关键问题。首先,我们提出了一种基于压缩感测理论的文本流采样框架,可以使用轻量级框架来对文本流进行采样,以减少空间消耗,同时仍然保持最有价值的文本。然后,我们开发基于查询的基于词的检索任务以及样本流上的主题检测和演进分析任务,以评估保留有价值信息的框架的性能。使用社交媒体的两个代表数据集(包括压缩比,运行时,信息预留速率和文本分析任务的效率)来评估框架。实验结果表明,所提出的框架优于基线方法,并能够通过有希望的结果完成文本分析任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号