【24h】

Chinese Categorization and Novelty Mining

机译:中国分类与新奇矿业

获取原文

摘要

The categorization and novelty mining of chronologically ordered documents is an important data mining problem. This paper focuses on the entire process of Chinese novelty mining, from preprocessing and categorization to the actual detection of novel information, which has rarely been studied. First, preprocessing techniques for detecting novel Chinese text are discussed and compared. Next, we investigate the categorization and novelty mining performance between English and Chinese sentences and also discuss the novelty mining performance based on the retrieval results. Moreover, we propose new novelty mining evaluation measures, Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, which measures the sensitivity of the novelty mining system to the incorrectly classified sentences. The results indicate that Chinese novelty mining at the sentence level is similar to English if the sentences are perfectly categorized. Using our new evaluation measures of Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, we can more fairly assess how the performance of novelty mining is influenced by the retrieval results.
机译:按年代有序文档的分类和新颖性挖掘是一个重要的数据挖掘问题。本文侧重于中国矿业新奇的整个过程,从预处理和分类,以新颖的信息,这很少被研究的实际检测。首先,讨论了用于检测新型中文文本的预处理技术并进行比较。接下来,我们调查英汉句子之间的分类和新颖的采矿业绩,并根据检索结果讨论新颖性采矿业绩。此外,我们提出了新的新颖性挖掘评估措施,新颖精度,新奇 - 召回,新颖性得分和敏感性,从而测量了新颖性挖掘系统对错误分类句子的敏感性。结果表明,如果句子完全分类,中国句子级别的新奇挖掘与英语类似于英语。利用我们的新评估措施的新颖性精度,新奇召回,新颖性评分和敏感性,我们可以更公平地评估新颖性挖掘的表现如何受到检索结果的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号