【24h】

Chinese Categorization and Novelty Mining

机译:中文分类与新颖性挖掘

获取原文

摘要

The categorization and novelty mining of chronologically ordered documents is an important data mining problem. This paper focuses on the entire process of Chinese novelty mining, from preprocessing and categorization to the actual detection of novel information, which has rarely been studied. First, preprocessing techniques for detecting novel Chinese text are discussed and compared. Next, we investigate the categorization and novelty mining performance between English and Chinese sentences and also discuss the novelty mining performance based on the retrieval results. Moreover, we propose new novelty mining evaluation measures, Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, which measures the sensitivity of the novelty mining system to the incorrectly classified sentences. The results indicate that Chinese novelty mining at the sentence level is similar to English if the sentences are perfectly categorized. Using our new evaluation measures of Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, we can more fairly assess how the performance of novelty mining is influenced by the retrieval results.
机译:按时间顺序排序的文档的分类和新颖性挖掘是一个重要的数据挖掘问题。本文着眼于中国新颖性挖掘的整个过程,从预处理和分类到对新颖信息的实际检测,这一点鲜有研究。首先,讨论并比较了用于检测中文小说的预处理技术。接下来,我们研究了英汉句子之间的分类和新颖性挖掘性能,并根据检索结果讨论了新颖性挖掘性能。此外,我们提出了新奇挖矿评估方法,即新奇精度,新奇召回,新奇F分数和敏感性,以衡量新奇挖掘系统对错误分类句子的敏感性。结果表明,如果句子分类正确,则句子层次的中文新颖性挖掘与英语相似。使用我们的新颖性,准确性,召回率,新颖性-F得分和敏感性的新评估方法,我们可以更公平地评估新颖性挖掘的性能如何受到检索结果的影响。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号