Corpus-based statistical screening for phrase identification.

Kim W; Wilbur WJ

首页> 外文期刊>Journal of the American Medical Informatics Association : >Corpus-based statistical screening for phrase identification.

【24h】

Corpus-based statistical screening for phrase identification.

机译：基于语料库的统计筛选，用于短语识别。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

PURPOSE: The authors study the extraction of useful phrases from a natural language database by statistical methods. The aim is to leverage human effort by providing preprocessed phrase lists with a high percentage of useful material. METHOD: The approach is to develop six different scoring methods that are based on different aspects of phrase occurrence. The emphasis here is not on lexical information or syntactic structure but rather on the statistical properties of word pairs and triples that can be obtained from a large database. MEASUREMENTS: The Unified Medical Language System (UMLS) incorporates a large list of humanly acceptable phrases in the medical field as a part of its structure. The authors use this list of phrases as a gold standard for validating their methods. A good method is one that ranks the UMLS phrases high among all phrases studied. Measurements are 11-point average precision values and precision-recall curves based on the rankings. RESULT: The authors find of six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE. These methods are applicable both to word pairs and word triples. All six methods are optimally combined to produce composite scoring methods that are more effective than any single method. The quality of the composite methods appears sufficient to support the automatic placement of hyperlinks in text at the site of highly ranked phrases. CONCLUSION: Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text.

机译：目的：作者研究通过统计方法从自然语言数据库中提取有用短语。目的是通过为预处理的短语列表提供高百分比的有用材料来利用人工。方法：该方法是根据短语出现的不同方面开发六种不同的评分方法。这里的重点不是词法信息或句法结构，而是从大型数据库中可获得的词对和三元组的统计特性。测量：统一医学语言系统（UMLS）在医学领域内包含了大量人类可接受的短语，作为其结构的一部分。作者使用此短语列表作为验证其方法的黄金标准。一种好的方法是在所有研究的短语中将UMLS短语排名较高。测量是基于排名的11点平均精度值和精度调用曲线。结果：作者发现了六种不同的评分方法，每种方法都被证明可以有效地识别MEDLINE较大子集中的UMLS质量短语。这些方法适用于单词对和单词三元组。将这六种方法进行了最佳组合，以产生比任何一种方法都更有效的综合评分方法。复合方法的质量似乎足以支持将超链接自动放置在高排名短语站点中的文本中。结论：统计评分方法为从自然语言数据库中提取有用短语以为文本建立索引或提供超链接提供了一种有前途的方法。

著录项

来源
《Journal of the American Medical Informatics Association :》 |2000年第5期|共13页
作者
Kim W; Wilbur WJ;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类医药、卫生;
关键词
Abstracting and Indexing; Hypermedia; Information Storage and Retrieval; 文摘编写和标引; 超多媒体; 信息存储和检索; 自然语言处理; 统一医学语言系统;

机译：Abstracting and Indexing;Hypermedia;Information Storage and Retrieval;文摘编写和标引;超多媒体;信息存储和检索;自然语言处理;统一医学语言系统;

相似文献

外文文献
中文文献
专利

1. Corpus-based statistical screening for phrase identification. [J] . Kim W, Wilbur WJ Journal of the American Medical Informatics Association : . 2000,第5期

机译：基于语料库的统计筛选，用于短语识别。
2. Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech [J] . Sanghun Kim, Youngjik Lee, Keikichi Hirose ETRI journal . 2001,第4期

机译：基于短语断裂强度和修剪的语料库文本转语音单元生成
3. A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics [J] . Singh Jasmeet, Gupta Vishal Knowledge-Based Systems . 2019,第SEPa15期

机译：一种新的基于词典和语料统计的无监督语料库词干提取技术
4. Better statistical estimation can benefit all phrases in phrase-based statistical machine translation [C] . Workshop on Spoken Language Technology . 2008

机译：更好的统计估计可以使基于短语的统计机器翻译中的所有短语受益
5. A corpus-based analysis of 'I' and 'me' variation in coordinate noun phrases. [D] . Turley, Nancy Romans. 2009

机译：基于语料库的坐标名词短语中“ I”和“ me”变化的分析。
6. Corpus-based Statistical Screening for Phrase Identification [O] . Won Kim, W. John Wilbur 2000

机译：基于语料库的短语识别统计筛选
7. Corpus-based Statistical Screening for Phrase Identification [O] . Kim, Won, Wilbur, W. John 2000

机译：基于语料库的短语识别统计筛选

Corpus-based statistical screening for phrase identification.

摘要

著录项

相似文献

相关主题

期刊订阅