首页> 外文会议>International workshop on computational terminology >A Study on the Interplay Between the Corpus Size and Parameters of a Distributional Model for Term Classification
【24h】

A Study on the Interplay Between the Corpus Size and Parameters of a Distributional Model for Term Classification

机译:分类分类分布模型的语料库大小与参数的相互作用研究

获取原文

摘要

We propose and evaluate a method for identifying co-hyponym lexical units in a terminological resource. The principles of term recognition and distributional semantics are combined to extract terms from a similar category of concept. Given a set of candidate terms, random projections are employed to represent them as low-dimensional vectors. These vectors are derived automatically from the frequency of the co-occurrences of the candidate terms and words that appear within windows of text in their proximity (context-windows). In a k-nearest neighbours framework, these vectors are classified using a small set of manually annotated terms which exemplify concept categories. We then investigate the interplay between the size of the corpus that is used for collecting the co-occurrences and a number of factors that play roles in the performance of the proposed method: the configuration of context-windows for collecting co-occurrences, the selection of neighbourhood size (k), and the choice of similarity metric.
机译:我们提出并评估了一种在术语资源中识别共同虚拟词汇单位的方法。术语识别和分配语义的原则组合以从类似类别的概念中提取术语。给定一组候选术语,采用随机投影来表示它们作为低维向量。这些向量是从候选术语的共同发生的频率衍生出来的,它们在其接近的文本的Windows中出现的单词(上下文-Windows)。在K-Collest邻居框架中,这些向量使用符号概念类别的一小组手动注释的术语进行分类。然后,我们调查用于收集共同发生的语料库的大小与许多因素之间的相互作用以及在所提出的方法的性能中发挥作用的因素:用于收集共同发生的上下文窗口的配置,选择邻里大小(k),以及相似度量的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号