Finding keywords amongst noise: Automatic text classification without parsing

机译：在噪音中寻找关键字：自动文本分类，无需解析

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The amount of text stored on the Internet, and in our libraries, continues to expand at an exponential rate. There is a great practical need to locate relevant content. This requires quick automated methods for classifying textual information, according to subject. We propose a quick statistical approach, which can distinguish between 'keywords' and 'noisewords', like 'the' and 'a', without the need to parse the text into its parts of speech. Our classification is based on an F-statistic, which compares the observed Word Recurrence Interval (WRI) with a simple null hypothesis. We also propose a model to account for the observed distribution of WRI statistics and we subject this model to a number of tests.

机译：Internet和我们的图书馆中存储的文本数量继续呈指数级增长。定位相关内容非常需要实践。这需要根据主题快速自动地对文本信息进行分类的方法。我们提出了一种快速的统计方法，该方法可以区分“关键词”和“噪音词”（例如“ the”和“ a”），而无需将文本解析为词性。我们的分类基于F统计量，该统计量将观察到的单词重复间隔（WRI）与简单的虚假假设进行比较。我们还提出了一个模型，以说明观察到的WRI统计信息的分布，并且对该模型进行了大量测试。

著录项

来源
《Noise and Stochastics in Complex Systems and Finance; Proceedings of SPIE-The International Society for Optical Engineering; vol.6601》|2007年|660113.1-660113.12|共12页
会议地点 Florence(IT)
作者
Andrew G. Allison; Charles E. M. Pearce; Derek Abbott;
展开▼
作者单位

Centre for Biomedical Engineering (CBME) and School of Electrical Electronic Engineering, The University of Adelaide, SA 5005, Australia;

School of Mathematical Sciences, The University of Adelaide, SA 5005, Australia;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类半导体技术;
关键词
keywords; word recurrence interval; finite mixture distributions; mixed Poisson process; maximum likelihood; Kolmogorov-Smirnov;

机译：关键字；单词重复间隔；有限的混合物分布；混合泊松过程最大似然;柯尔莫哥洛夫-斯米尔诺夫;

相似文献

外文文献
中文文献
专利

1. Language-independent extractive automatic text summarization based on automatic keyword extraction [J] . Angel Hernandez-Castaneda, Rene Arnulfo Garcia-Hernandez, Yulia Ledeneva, Computer speech and language . 2022,第Jana期

机译：基于自动关键字提取的语言独立的提取自动文本摘要
2. Automatic Amharic Text Summarization using NLP Parser [J] . Getahun Tadesse Mekuria, Aniket S. Jagtap International Journal of Engineering Trends and Technology . 2017,第1期

机译：使用NLP分析器自动进行Amharic文本汇总
3. Automatic extraction of keywords from scientific text:application to the knowledge domain of protein families [J] . Miguel A.Andrade... Bioinformatics . 1998,第7期

机译：从科学文本中自动提取关键词：在蛋白质家族知识领域的应用
4. Finding keywords amongst noise: automatic text classification without parsing [C] . Andrew G. Allison, Charles E. M. Pearce, Derek Abbott Conference on Noise and Stochastics in Complex Systems and Finance . 2007

机译：在噪声中查找关键字：无解析的自动文本分类
5. Identifying the gist of conversational text: Automatic keyword extraction and summarization. [D] . Liu, Fei. 2011

机译：识别对话文本的要点：自动关键词提取和汇总。
6. Natural Language Processing and Automatic SNOMED-Encoding of Free Text: An Analysis of Free Text Data from a Routine Electronic Patient Record Application with a Parsing Tool Using the German SNOMED II [O] . Joerg H. Hohnloser, Matthias Holzer, Martin R.G. Fischer, 1996

机译：自然语言处理和自由文本的自动SNOMED编码：使用德语SNOMED II的解析工具对例行电子病历应用中的自由文本数据进行分析
7. Finding keywords amongst noise: Automatic text classification without parsing [O] . Andrew G. Allisona, Charles E. M. Pearceb, Derek Abbotta 2016

机译：在噪声中查找关键字：无需解析的自动文本分类

Finding keywords amongst noise: Automatic text classification without parsing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅