【24h】

Detecting Information Structures in Texts

机译:检测文本中的信息结构

获取原文

摘要

The colossal growth of volatile online text data evokes the demand for automatic text analysis tools to identify worthwhile information. Documents, as well as text streams, can be structured beyond the concept of frequency distributions. Here we introduce a novel method that provides a relative measure for information value over a time series that is mapped by a dynamic trie structure. We adapt the concept of entropy for textual data and employ a compression-based estimation method. The algorithm can perform in a real-time scenario because of its linear complexity and since it is based on a dynamic history of predefined size. We show the suitability of our method with an experimental dataset and compare our results to an existing approach. Our results reveal structural properties of the texts and permit for deeper analysis of the presumably information peaks.
机译:挥发性在线文本数据的巨大增长唤起了对自动文本分析工具的需求来识别有价值的信息。文档以及文本流,可以构建超出频率分布的概念。在这里,我们介绍一种新的方法,该方法提供了通过动态Trie结构映射的时间序列上的信息值的相对度量。我们调整文本数据熵的概念,采用基于压缩的估计方法。由于其线性复杂性,该算法可以在实时方案中执行,并且由于它基于预定义大小的动态历史。我们展示了我们对实验数据集的方法的适用性,并将我们的结果与现有方法进行比较。我们的结果揭示了文本的结构性,并允许对可能的信息峰的更深入分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号