首页> 外文期刊>Journal of the American Society for Information Science and Technology >A Framework of Automatic Subject Term Assignment for Text Categorization: An Indexing Conception-Based Approach
【24h】

A Framework of Automatic Subject Term Assignment for Text Categorization: An Indexing Conception-Based Approach

机译:文本分类的自动主题词分配框架:基于索引概念的方法

获取原文
获取原文并翻译 | 示例
           

摘要

The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, 2000). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TCrnthan the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.
机译:这项研究的目的是检验人类索引员对主题索引过程的理解是否对通过文本分类(TC)进行自动主题词分配的有效性产生积极影响。更具体地说,在典型的科学期刊文章数据集中,探索了人类索引者的主题索引方法或概念以及语义来源。在基于主题索引的方法或具有语义来源的概念对于通过TC自动分配主题术语的前提下,本研究提出了一种基于索引概念的框架。出于本研究的目的,探讨了两个研究问题:语义来源在多大程度上有效?索引概念在多大程度上有效?实验是在WEKA(I.H. Witten&E.Frank,2000)中使用支持向量机实现的。使用F-measure,实验结果表明,引用的作品,原文标题和标题与全文同等有效,而发现关键词比全文更有效。此外,调查结果表明,基于索引概念的框架比全文本更有效。尤其是面向内容和面向文档的索引方法比全文更有效。在三种基于索引概念的方法中,面向内容的方法和面向文档的方法比面向领域的方法更有效。换句话说,在典型的科学期刊文章数据集的背景下,对于通过TCrn自动分配主题术语而言,客观内容和作者意图比可能的用户需求更为可取。这项研究的结果支持将人类索引器的索引方法或概念与语义源结合使用,对自动主题词分配的有效性产生积极影响。

著录项

  • 来源
  • 作者单位

    Ewha Womans University, Library and Information Science, 11-1 Seodaemun-Gu Daehyun-Dong, Seoul, Korea 120-750;

    University of North Texas, College of Information, Department of Library and Information Sciences, 1155 Union Circle 311068, Denton, TX 76203;

    University of South Carolina, School of Library and Information Science, 1501 Greene Street, Columbia, SC 29208;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号