首页> 外文学位 >Annotating a corpus of biomedical research texts: Two models of rhetorical analysis.
【24h】

Annotating a corpus of biomedical research texts: Two models of rhetorical analysis.

机译:注释生物医学研究文献集:修辞分析的两种模型。

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in the biomedical sciences have led to an enormous increase in the amount of research literature being published, most of it in electronic form; researchers are finding it difficult to keep up-to-date on all of the new developments in their fields. As a result there is a need to develop automated Text Mining tools to filter and organize data in a way which is useful to researchers. Human-annotated data are often used as the 'gold standard' to train such systems via machine learning methods.;In order to properly train automated systems, and as a gauge of the shared understanding of the argument scheme being applied, inter-annotator agreement should be relatively high. The results of this study show complete (three-way) inter-annotator agreement on an average of 60.5% of the 400 sentences in the final corpus under Model 1, and 39.3% under Model 2. Analyses of the inter-annotator variation are done in order to examine in detail all of the factors involved; these include particular Model categories, individual annotator preferences, errors, and the corpus data itself. In order to reduce this inter-annotator variation, revisions to both Models are suggested; also it is recommended that in the future biomedical domain experts, possibly in tandem with experts in rhetoric, be used as annotators.;KEY WORDS: annotation, argument, biomedical text, computational linguistics, information extraction, rhetoric, text mining;This thesis reports on a project where three annotators applied two Models of rhetoric (argument) to a corpus of on-line biomedical research texts. How authors structure their argumentation and which rhetorical strategies they employ are key to how researchers present their experimental results; thus rhetorical analysis of a text could allow for the extraction of information which is pertinent for a particular researcher's purpose. The first Model stems from previous work in Computational Linguistics; it focuses on differentiating 'new' from 'old' information, and results from analysis of results. The second Model is based on Toulmin's argument structure (1958/2003); its main focus is to identify 'Claims' being made by the authors, but it also differentiates between internal and external evidence, as well as categories of explanation and implications of the current experiment.
机译:生物医学科学的最新进展已导致大量发表研究文献,其中大部分以电子形式发表。研究人员发现很难及时掌握其领域中的所有新进展。结果,需要开发自动化的文本挖掘工具,以对研究人员有用的方式来过滤和组织数据。人工注释的数据通常用作通过机器学习方法训练此类系统的“黄金标准”。为了适当地训练自动化系统,并作为对所应用的论证方案的共同理解的衡量标准,注释者之间的协议应该比较高。这项研究的结果表明,在模型1下,最终语料库中400个句子中平均有60.5%的注释者之间完全(三方向)一致,而在模型2下则为39.3%。为了详细研究所有涉及的因素;这些包括特定的模型类别,单个注释者首选项,错误以及语料库数据本身。为了减少注释器之间的这种差异,建议对两个模型进行修订。还建议在将来的生物医学领域专家中,可能与修辞学专家一起作为注释者。关键词:注释,论据,生物医学文本,计算语言学,信息提取,修辞学,文本挖掘;本论文报告在一个项目中,三个注释者将两种修辞学模型(论据)应用于在线生物医学研究文本的语料库。作者如何构建论点以及采用哪种修辞策略是研究人员展示实验结果的关键;因此,对文本进行修辞分析可以提取与特定研究者目的相关的信息。第一个模型源自先前在计算语言学方面的工作;它着重于区分“新”信息与“旧”信息以及结果分析结果。第二个模型基于Toulmin的论点结构(1958/2003);它的主要重点是确定作者提出的“索赔”,但也区分内部和外部证据,以及当前实验的解释和含义类别。

著录项

  • 作者

    White, Barbara Ellen.;

  • 作者单位

    The University of Western Ontario (Canada).;

  • 授予单位 The University of Western Ontario (Canada).;
  • 学科 Language Rhetoric and Composition.;Information Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 268 p.
  • 总页数 268
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号