首页> 外文会议>International workshop on computational terminology >A Study on the Interplay Between the Corpus Size and Parameters of a Distributional Model for Term Classification
【24h】

A Study on the Interplay Between the Corpus Size and Parameters of a Distributional Model for Term Classification

机译:术语分类的分布模型的语料库大小与参数之间的相互作用研究

获取原文

摘要

We propose and evaluate a method for identifying co-hyponym lexical units in a terminological resource. The principles of term recognition and distributional semantics are combined to extract terms from a similar category of concept. Given a set of candidate terms, random projections are employed to represent them as low-dimensional vectors. These vectors are derived automatically from the frequency of the co-occurrences of the candidate terms and words that appear within windows of text in their proximity (context-windows). In a k-nearest neighbours framework, these vectors are classified using a small set of manually annotated terms which exemplify concept categories. We then investigate the interplay between the size of the corpus that is used for collecting the co-occurrences and a number of factors that play roles in the performance of the proposed method: the configuration of context-windows for collecting co-occurrences, the selection of neighbourhood size (k), and the choice of similarity metric.
机译:我们提出并评估一种用于识别术语资源中的同义词词汇单位的方法。术语识别和分布语义的原理被组合以从相似的概念类别中提取术语。给定一组候选项,采用随机投影将其表示为低维向量。这些向量是根据出现在其邻近的文本窗口(context-windows)中的候选词和单词的共现频率自动得出的。在k最近邻框架中,使用一小组手动注释的术语对这些向量进行分类,这些术语举例说明了概念类别。然后,我们研究了用于收集共现的语料库大小与在所提出的方法的性能中起作用的许多因素之间的相互影响:用于收集共现的上下文窗口的配置,选择邻域大小(k)以及相似性度量的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号