首页> 外文会议>Conference on Noise and Stochastics in Complex Systems and Finance >Finding keywords amongst noise: Automatic text classification without parsing
【24h】

Finding keywords amongst noise: Automatic text classification without parsing

机译:在噪声中查找关键字:无解析的自动文本分类

获取原文

摘要

The amount of text stored on the Internet, and in our libraries, continues to expand at an exponential rate. There is a great practical need to locate relevant content. This requires quick automated methods for classifying textual information, according to subject. We propose a quick statistical approach, which can distinguish between 'keywords' and 'noisewords', like 'the' and 'a', without the need to parse the text into its parts of speech. Our classification is based on an F-statistic, which compares the observed Word Recurrence Interval (WRI) with a simple null hypothesis. We also propose a model to account for the observed distribution of WRI statistics and we subject this model to a number of tests.
机译:存储在互联网上的文本和在我们的图书馆中的数量继续以指数率扩展。找到相关内容的实用性很大。根据主题,这需要快速自动化方法来分类文本信息。我们提出了一种快速的统计方法,可以区分“关键词”和“临界词”,如“”和“A”,而无需将文本解析为语音的部分。我们的分类基于F统计数据,它将观察到的Word复发间隔(WRI)与简单的NULL假设进行了比较。我们还提出了一个模型,以考虑观察到的WRI统计数据分布,我们将此模型进行了许多测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号