首页> 外文学位 >An organizational memory system for capturing information from unstructured text: The Infoscan and Infoview Systems.
【24h】

An organizational memory system for capturing information from unstructured text: The Infoscan and Infoview Systems.

机译:用于从非结构化文本中捕获信息的组织存储系统:Infoscan和Infoview系统。

获取原文
获取原文并翻译 | 示例

摘要

Organizational memory refers to stored information from an organization's history that can be brought to bear on present decisions (Walsh and Ungston, 1991). Information of value to an organizational memory can often be found in unstructured formats such as personal notes, memos, and messages. Information contained in informal documents can be difficult and expensive to capture. A set of knowledge management tools is needed to facilitate the acquisition of useful information from informal documents.; This research addresses the problem of how useful information can be acquired from unstructured text and stored in an accessible form in an efficient and economic manner.; In this research a logical architecture for an organizational memory system (OMS) is proposed. A prototype system is developed to demonstrate the feasibility of the architecture. The prototype system, consisting of two programs called InfoScan and InfoView, is tested on a corpus of 10,000 e-mail messages. In the test the system achieved 87% recall, 87% precision and an overall performance of 87%.; InfoScan analyses a training set of documents to develop a template consisting of key words and phrases. The template is used to locate documents related to that subject. The second program, InfoView, is a database designed to give a user an effective means of viewing the messages selected by InfoScan.; The keyword selection process is based on the concept of cue validity developed in cognitive psychology and used by Goldberg (1996) in text categorization. Cue validity to provides “…a measure of the degree to which a particular feature distinguishes instances of a concept from instances of contrasting concepts.” (Goldberg, 1996).; InfoScan parses words from a training set of documents, cleans, spell checks, tags by speech type, and calculates cue values for each word. Words of certain speech types and with low cue values are eliminated, producing a list of potential key words. A human operator selects those words from the list which are most closely related to the target subject. Key words and phrases are used to evaluate the rest of the documents in the corpus.
机译:组织记忆是指组织历史中存储的信息,可以根据当前的决策来使用(Walsh和Ungston,1991)。对组织记忆有价值的信息通常可以以非结构化的格式找到,例如个人笔记,备忘录和消息。非正式文件中包含的信息可能很难获取且昂贵。需要一套知识管理工具来促进从非正式文件中获取有用信息。这项研究解决了如何从非结构化文本中获取有用信息并以一种有效且经济的方式将其以可访问的形式存储的问题。在这项研究中,提出了用于组织存储系统(OMS)的逻辑体系结构。开发了一个原型系统来演示该体系结构的可行性。该原型系统由两个名为InfoScan和InfoView的程序组成,已对10,000条电子邮件的语料库进行了测试。在测试中,该系统实现了87%的查全率,87%的精度和87%的总体性能。 InfoScan分析一组培训文档,以开发一个由关键字和短语组成的模板。该模板用于查找与该主题相关的文档。第二个程序InfoView是一个数据库,旨在为用户提供查看InfoScan选择的消息的有效方法。关键字选择过程基于认知心理学中提出的提示有效性概念,并由Goldberg(1996)在文本分类中使用。提示有效性可提供“…某种程度的特征,用于区分特定概念与对比概念之间的区别。” (Goldberg,1996)。 InfoScan会从一组训练有素的文档中解析单词,进行清理,拼写检查,按语音类型进行标记,并为每个单词计算提示值。消除某些语音类型且提示值较低的单词,从而生成潜在关键字列表。操作员从列表中选择与目标对象关系最密切的单词。关键字和短语用于评估语料库中的其余文档。

著录项

  • 作者

    Petersen, Lawrence C.;

  • 作者单位

    Texas A&M University.;

  • 授予单位 Texas A&M University.;
  • 学科 Information Science.
  • 学位 Ph.D.
  • 年度 1999
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 信息与知识传播;
  • 关键词

  • 入库时间 2022-08-17 11:48:17

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号