【24h】

Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining

机译:使用模式共现矩阵来清理封闭的顺序模式以进行文本挖掘

获取原文

摘要

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.
机译:随着网络上文本数量的飞速增长,人们几乎无法跟上最新信息。文本挖掘是通过发现样式和趋势从文本中获取有趣信息的过程。文本挖掘算法用于保证所提取知识的质量。但是,使用文本或数据挖掘算法或方法提取的模式会导致噪声模式和不一致。因此,出现了不同的挑战,例如如何理解这些模式,已使用的模型是否合适以及是否已提取的所有模式都相关的问题。此外,研究提出了一个问题,即如何对所提取的知识给予正确的权重。为了解决这些问题,本文提出了一种文本后处理方法,该方法使用模式共现矩阵来查找提取的模式之间的关系,以减少噪声模式。本文的主要目的不仅是减少闭合顺序模式的数量,而且还提高了模式挖掘的性能。 Reuters Corpus第1卷数据收集和TREC过滤主题的实验结果表明,该方法很有希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号