...
首页> 外文期刊>Behavior Research Methods >Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice
【24h】

Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice

机译:处理零词频:回顾现有的经验法则并提出基于证据的选择建议

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In a critical review of the heuristics used to deal with zero word frequencies, we show that four are suboptimal, one is good, and one may be acceptable. The four suboptimal strategies are discarding words with zero frequencies, giving words with zero frequencies a very low frequency, adding 1 to the frequency per million, and making use of the Good–Turing algorithm. The good algorithm is the Laplace transformation, which consists of adding 1 to each frequency count and increasing the total corpus size by the number of word types observed. A strategy that may be acceptable is to guess the frequency of absent words on the basis of other corpora and then increasing the total corpus size by the estimated summed frequency of the missing words. A comparison with the lexical decision times of the English Lexicon Project and the British Lexicon Project suggests that the Laplace transformation gives the most useful estimates (in addition to being easy to calculate). Therefore, we recommend it to researchers.
机译:在对用于处理零个单词频率的启发式方法的严格审查中,我们显示了四个是次优的,一个是好的,并且一个是可以接受的。四个次优策略是丢弃零频率的单词,为零频率的单词提供非常低的频率,将百万分之一的频率加1,并利用Good-Turing算法。好的算法是拉普拉斯变换,该变换包括对每个频率计数加1并通过观察到的单词类型数量增加总语料库大小。可以接受的策略是在其他语料库的基础上猜测缺少单词的频率,然后通过估计缺失词的总和频率来增加总语料库大小。通过与英语词典项目和英国词典项目的词汇决策时间进行比较,可以看出,拉普拉斯变换提供了最有用的估计(除了易于计算之外)。因此,我们推荐给研究人员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号