A Supervised Learning Approach to Search of Definitions

Jun Xu; Yun-Bo Cao; Hang Li; Min Zhao; Ya-Lou Huang

摘要

This paper addresses the issue of search of definitions. Specifically, for a given term, we are to find out its definition candidates and rank the candidates according to their likelihood of being good definitions. This is in contrast to the traditional methods of either generating a single combined definition or outputting all retrieved definitions. Definition ranking is essential for tasks. A specification for judging the goodness of a definition is given. In the specification, a definition is categorized into one of the three levels: good definition, indifferent definition, or bad definition. Methods of performing definition ranking are also proposed in this paper, which formalize the problem as either classification or ordinal regression.We employ SVM (Support Vector Machines) as the classification model and Ranking SVM as the ordinal regression model respectively, and thus they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined, which represent the characteristics of terms, definition candidate, and their relationship. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods such as heuristic rules, the conventional information retrieval-Okapi, or SVM regression.This is true when both the answers are paragraphs and they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

机译：本文解决了定义搜索的问题。具体来说，对于给定的术语，我们将找出其定义候选者，并根据其成为良好定义的可能性对候选者进行排名。这与生成单个组合定义或输出所有检索到的定义的传统方法相反。定义排名对于任务至关重要。给出了判断定义是否正确的规范。在说明书中，定义被分为三个级别之一：良好定义，无关紧要的定义或错误定义。本文还提出了执行定义排序的方法，将问题正式化为分类或序数回归。我们分别使用SVM（支持向量机）作为分类模型，对SVM进行排序作为序数回归模型，从而对定义进行排序候选人根据自己被定义好的可能性。定义了构建SVM和排名SVM模型的功能，这些功能代表术语，定义候选者及其关系的特征。实验结果表明，使用SVM和Rank SVM可以明显优于诸如启发式规则，常规信息检索-Okapi或SVM回归之类的基线方法。当答案都为段落且均为句子时，这是正确的。实验结果还表明，在一个域中训练的SVM或等级SVM模型可以适应另一个领域，表明可以构建用于定义等级的通用模型。

A Supervised Learning Approach to Search of Definitions

摘要

著录项

相关主题

期刊订阅