首页> 美国卫生研究院文献>Journal of the Boston Society of Medical Sciences >Strategies for searching medical natural language text. Distribution of words in the anatomic diagnoses of 7000 autopsy subjects.
【2h】

Strategies for searching medical natural language text. Distribution of words in the anatomic diagnoses of 7000 autopsy subjects.

机译:搜索医学自然语言文本的策略。 7000名尸检对象的解剖学诊断中单词的分布。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computerized indexing and retrieval of medical records is increasingly important; but the use of natural language versus coded languages (SNOP, SNOMED) for this purpose remains controversial. In an effort to develop search strategies for natural language text, the authors examined the anatomic diagnosis reports by computer for 7000 consecutive autopsy subjects spanning a 13-year period at The Johns Hopkins Hospital. There were 923,657 words, 11,642 of them distinct. The authors observed an average of 1052 keystrokes, 28 lines, and 131 words per autopsy report, with an average 4.6 words per line and 7.0 letters per word. The entire text file represented 921 hours of secretarial effort. Words ranged in frequency from 33,959 occurrences of "and" to one occurrence for each of 3398 different words. Searches for rare diseases with unique names or for representative examples of common diseases were most readily performed with the use of computer-printed key word in context (KWIC) books. For uncommon diseases designated by commonly used terms (such as "cystic fibrosis"), needs were best served by a computerized search for logical combinations of key words. In an unbalanced word distribution, each conjunction (logical and) search should be performed in ascending order of word frequency; but each alternation (logical inclusive or) search should be performed in descending order of word frequency. Natural language text searches will assume a larger role in medical records analysis as the labor-intensive procedure of translation into a coded language becomes more costly, compared with the computer-intensive procedure of text searching.
机译:计算机化索引和检索病历变得越来越重要;但是,将自然语言与编码语言(SNOP,SNOMED)用于此目的仍然存在争议。为了开发针对自然语言文字的搜索策略,作者通过计算机检查了约翰霍普金斯医院(Johns Hopkins Hospital)长达13年的7000名连续尸检对象的解剖学诊断报告。有923,657个单词,其中11,642个不同。作者观察到平均每笔验尸报告有1052次击键,28行和131个单词,平均每行4.6个单词和每个单词7.0个字母。整个文本文件代表921个小时的秘书工作。单词的频率范围从33959个“ and”出现到3398个不同单词中的每个出现一次。搜寻具有唯一名称的罕见疾病或常见疾病的代表性实例最容易通过使用计算机打印的上下文关键字(KWIC)书籍来进行。对于常用术语指定的罕见疾病(例如“囊性纤维化”),最好通过计算机搜索关键词的逻辑组合来满足需求。在不平衡的单词分布中,每个合词(逻辑和)搜索应以单词频率的升序执行;但是,每个交替搜索(包括逻辑或)均应按字频的降序执行。自然语言文本搜索将在医疗记录分析中扮演更大的角色,因为与计算机密集型文本搜索过程相比,翻译成编码语言的劳动密集型过程变得更加昂贵。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号