Finding keywords amongst noise: Automatic text classification without parsing

机译：在噪声中查找关键字：无解析的自动文本分类

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The amount of text stored on the Internet, and in our libraries, continues to expand at an exponential rate. There is a great practical need to locate relevant content. This requires quick automated methods for classifying textual information, according to subject. We propose a quick statistical approach, which can distinguish between 'keywords' and 'noisewords', like 'the' and 'a', without the need to parse the text into its parts of speech. Our classification is based on an F-statistic, which compares the observed Word Recurrence Interval (WRI) with a simple null hypothesis. We also propose a model to account for the observed distribution of WRI statistics and we subject this model to a number of tests.

机译：存储在互联网上的文本和在我们的图书馆中的数量继续以指数率扩展。找到相关内容的实用性很大。根据主题，这需要快速自动化方法来分类文本信息。我们提出了一种快速的统计方法，可以区分“关键词”和“临界词”，如“”和“A”，而无需将文本解析为语音的部分。我们的分类基于F统计数据，它将观察到的Word复发间隔（WRI）与简单的NULL假设进行了比较。我们还提出了一个模型，以考虑观察到的WRI统计数据分布，我们将此模型进行了许多测试。

著录项

来源
《Conference on Noise and Stochastics in Complex Systems and Finance》|2007年||共12页
会议地点
作者
Andrew G. Allison; Charles E. M. Pearce; Derek Abbott; SPIE-The International Society for Optical Engineering; EOS-European Optical Society; SIOF(IT);
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类半导体技术;
关键词
keywords; word recurrence interval; finite mixture distributions; mixed Poisson process; maximum likelihood; Kolmogorov-Smirnov;

机译：关键词;词复发间隔;有限混合分布;混合泊松过程;最大可能性;Kolmogorov-Smirnov;

相似文献

外文文献
中文文献
专利

1. Language-independent extractive automatic text summarization based on automatic keyword extraction [J] . Angel Hernandez-Castaneda, Rene Arnulfo Garcia-Hernandez, Yulia Ledeneva, Computer speech and language . 2022,第Jana期

机译：基于自动关键字提取的语言独立的提取自动文本摘要
2. Automatic Amharic Text Summarization using NLP Parser [J] . Getahun Tadesse Mekuria, Aniket S. Jagtap International Journal of Engineering Trends and Technology . 2017,第1期

机译：使用NLP分析器自动进行Amharic文本汇总
3. Automatic extraction of keywords from scientific text:application to the knowledge domain of protein families [J] . Miguel A.Andrade... Bioinformatics . 1998,第7期

机译：从科学文本中自动提取关键词：在蛋白质家族知识领域的应用
4. Finding keywords amongst noise: Automatic text classification without parsing [C] . Andrew G. Allison, Charles E. M. Pearce, Derek Abbott Noise and Stochastics in Complex Systems and Finance; Proceedings of SPIE-The International Society for Optical Engineering; vol.6601 . 2007

机译：在噪音中寻找关键字：自动文本分类，无需解析
5. Identifying the gist of conversational text: Automatic keyword extraction and summarization. [D] . Liu, Fei. 2011

机译：识别对话文本的要点：自动关键词提取和汇总。
6. Natural Language Processing and Automatic SNOMED-Encoding of Free Text: An Analysis of Free Text Data from a Routine Electronic Patient Record Application with a Parsing Tool Using the German SNOMED II [O] . Joerg H. Hohnloser, Matthias Holzer, Martin R.G. Fischer, 1996

机译：自然语言处理和自由文本的自动SNOMED编码：使用德语SNOMED II的解析工具对例行电子病历应用中的自由文本数据进行分析
7. Finding keywords amongst noise: Automatic text classification without parsing [O] . Andrew G. Allisona, Charles E. M. Pearceb, Derek Abbotta 2016

机译：在噪声中查找关键字：无需解析的自动文本分类

Finding keywords amongst noise: Automatic text classification without parsing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅