...
【24h】

An evaluation of English stemming data in full-text retrieval

机译:全文检索中英语词干数据的评估

获取原文
获取原文并翻译 | 示例

摘要

Various studies have focused on the effect of stemming on IR tasks. Experiments using a test collection like TREC have shown that the overall improvement of stemming is not significant because its effects on independent queries are so inconsistent that the damage to some queries may cancel out the benefits to others. When stemming indexing terms, we should avoid the risk of ill effects from overstemming. To understand the extent to which we should stem indexing terms, we conducted a set of experiments using TREC-7 and TREC-8 adhoc tasks. Targets for stemming are set in the following four steps: 1. Conflation of inflectionally related forms 2. Conflation of derivationally related forms excluding their minimal stem 3. Conflation of derivationally related forms including their minimal stem 4. Conflation of spelling variants The result shows that most of the ill effects are caused by conflating derivational variants including their ultimate stems (step 3). It also shows that the other steps damage only a few queries and produce fairly consistent improvements.
机译:各种研究都集中在阻止对IR任务的影响上。使用TREC之类的测试集合进行的实验表明,词干的总体改进并不显着,因为它对独立查询的影响是如此不一致,以致于对某些查询的破坏可能抵消了对其他查询的好处。在阻止索引术语时,我们应避免由于过度梗而带来不良影响的风险。为了了解我们应在多大程度上阻止索引项,我们使用TREC-7和TREC-8临时任务进行了一组实验。拟定词干的目标分为以下四个步骤:1.合并拐点相关形式的形式2.排除不包括最小茎的派生形式的形式的合并3.包括最小茎干的派生形式的形式的合并4.拼写变体的合并结果表明大多数不良影响是由混合派生的变体(包括其最终词干)引起的(第3步)。它还表明,其他步骤仅会损坏少数几个查询并产生相当一致的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号