首页> 外文OA文献 >Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction
【2h】

Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction

机译:学术写作中的独特词:三种用于关键词提取的统计测试的比较

摘要

Most studies that make use of keyword analysis rely on log-likelihood ratio or chi-square tests to extract words that are particularly characteristic of a corpus (e.g. Scott & Tribble 2006). These measures are computed on the basis of absolute frequencies and cannot account for the fact that "corpora are inherently variable internally" (Gries 2007). To overcome this limitation, measures of dispersion are sometimes used in combination with keyness values (e.g. Rayson 2003; Oakes & Farrow 2007). Some scholars have also suggested using other statistical measures (e.g. Wilcoxon-Mann-Whitney test) but these techniques have not gained corpus linguists' favour (yet?). One possible explanation for this lack of enthusiasm is that statistical tests for keyword extraction have rarely been compared. In this article, we make use of the log-likelihood ratio, the t-test and the Wilcoxon-Mann-Whitney test in turn to compare the academic and the fiction sub-corpora of the British National Corpus and extract words that are typical of academic discourse. We compare the three lists of academic keywords on a number of criteria (e.g. number of keywords extracted by each measure, percentage of keywords that are shared in the three lists, frequency and distribution of academic keywords in the two corpora) and explore the specificities of the three statistical measures. We also assess the advantages and disadvantages of these measures for the extraction of general academic words.
机译:大多数利用关键字分析的研究都依赖于对数似然比或卡方检验来提取特别是语料库特征的单词(例如Scott和Tribble 2006)。这些度量是根据绝对频率计算的,不能解释“语料库在内部是固有可变的”这一事实(Gries 2007)。为了克服该限制,有时将散布度与键度值结合使用(例如Rayson 2003; Oakes&Farrow 2007)。一些学者还建议使用其他统计量度(例如Wilcoxon-Mann-Whitney检验),但这些技术尚未获得语料库语言学家的青睐(还?)。对于这种缺乏热情的一种可能的解释是,很少对关键词提取的统计测试进行比较。在本文中,我们依次使用对数似然比,t检验和Wilcoxon-Mann-Whitney检验来比较英国国家语料库的学术和小说子语料库,并提取典型的单词。学术话语。我们根据多种标准比较了三个学术关键词列表(例如,每种度量提取的关键词数量,三个列表中共享关键词的百分比,两个语料库中学术关键词的频率和分布),并探讨了这三个统计指标。我们还评估了这些措施在提取一般学术词汇方面的优缺点。

著录项

  • 作者

    Paquot Magali; Bestgen Yves;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号