首页> 外文学位 >A machine learning approach to automate classification of literature in a SAM research database.
【24h】

A machine learning approach to automate classification of literature in a SAM research database.

机译:一种在SAM研究数据库中自动对文献进行分类的机器学习方法。

获取原文
获取原文并翻译 | 示例

摘要

In the mid-eighties, researchers at the University of Miami confronted their problem of information overload while investigating information on worker performance. They required literature sources from various fields, such as engineering, business, and psychology, to name a few. To cope with their information overload, they devised a research methodology to partition information resources into category matrices in order to find patterns, trends, or voids. The approach was termed State-of-the-Art Matrix or SAM Analysis.;SAM Analysis is a manual process, thus restricting the amount of information for conveying category decisions. During the first phase of the manual process, researchers construct models or categories that best describe the research area. In the next phase, articles from the information sources are read and assigned to the pre-defined categories based on the judgment of assessors.;The manual approach presents major challenges to researchers who must deal with identifying and utilizing the information hidden in a large corpus of information. The approach is only practical for a small number of articles and categorization relies on the subjective judgment of assessors. A more scalable and flexible approach, therefore, is needed for categorizing information, such as by using machine learning and data mining techniques to automate categorization of articles in large volumes of data.;In this research, automation is approached through the use of a machine learning technique known as a Learning Classifier Systems (LCS). The LCS performs the data mining task of categorizing articles using the SAM approach by utilizing training and testing datasets extracted from SAM EndNote bibliographic databases related to a specific area of research.;In order to evaluate the ability of the LCS to predict category membership, accuracy-based metrics borrowed from the field of medicine are applied. The metrics include sensitivity, specificity, positive predictive value, and negative predictive value.;After training, the evaluation results indicate that the predictive ability of the LCS system is greater than 90%. The results are obtained using a five trial experiment.*.;*This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation). The CD requires the following system requirements: XML Editor; WinZip; Internet browser.
机译:八十年代中期,迈阿密大学的研究人员在调查有关员工绩效的信息时遇到了信息过多的问题。他们需要来自各个领域的文献资料,例如工程,商业和心理学。为了应对他们的信息过载,他们设计了一种研究方法,将信息资源划分为类别矩阵,以便找到模式,趋势或空白。该方法被称为“最新矩阵”或SAM分析。; SAM分析是手动过程,因此限制了用于传达类别决策的信息量。在手动过程的第一阶段,研究人员会构建最能描述研究领域的模型或类别。在下一阶段,将根据评估者的判断来阅读信息源中的文章并将其分配给预定义的类别。手动方法给必须处理识别和利用隐藏在大型语料库中的信息的研究人员提出了重大挑战。信息。该方法仅适用于少数文章,并且分类取决于评估者的主观判断。因此,需要一种更具扩展性和灵活性的方法来对信息进行分类,例如通过使用机器学习和数据挖掘技术来自动对大量数据中的文章进行分类。;在本研究中,通过使用机器来实现自动化被称为学习分类器系统(LCS)的学习技术。 LCS通过利用从SAM EndNote参考书目数据库中提取的与特定研究领域相关的训练和测试数据集来执行使用SAM方法对文章进行分类的数据挖掘任务;为了评估LCS预测类别成员资格,准确性的能力从医学领域借用的基于基础的度量标准被应用。指标包括敏感性,特异性,阳性预测值和阴性预测值。训练后的评估结果表明,LCS系统的预测能力大于90%。结果是通过五个试验实验获得的。*。*本论文是复合文档(论文包含纸质副本和CD)。该CD需要满足以下系统要求:XML编辑器; WinZip;网络浏览器。

著录项

  • 作者

    Morris, Elizabeth P.;

  • 作者单位

    Texas Tech University.;

  • 授予单位 Texas Tech University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 318 p.
  • 总页数 318
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号