Various studies have focused on the effect of stemming on IR tasks. Experiments using a test collection like TREC have shown that the overall improvement of stemming is not significant because its effects on independent queries are so inconsistent that the damage to some queries may cancel out the benefits to others. When stemming indexing terms, we should avoid the risk of ill effects from overstemming. To understand the extent to which we should stem indexing terms, we conducted a set of experiments using TREC-7 and TREC-8 adhoc tasks. Targets for stemming are set in the following four steps: 1. Conflation of inflectionally related forms 2. Conflation of derivationally related forms excluding their minimal stem 3. Conflation of derivationally related forms including their minimal stem 4. Conflation of spelling variants The result shows that most of the ill effects are caused by conflating derivational variants including their ultimate stems (step 3). It also shows that the other steps damage only a few queries and produce fairly consistent improvements.
展开▼