【24h】

A Hierachical Collocation Extraction Tool

机译:分层搭配提取工具

获取原文
获取原文并翻译 | 示例

摘要

We design a hierarchical collocation extraction tool according to the three-layered linguistic properties of collocation. Based on the structured definitions of collocation, the extraction goes through three phases: i) extracting peripheral collocations in the frequency layer from dependency triples, ii) extracting semi-peripheral collocations in the syntactic layer by association measures (AMs), iii) extracting core collocations in the semantic layer with a similar word thesaurus. The thesaurus is created by taking all the collocations of a word as its features and computing the similarity between any two words. Experiments on our test corpus of China English with Oxford Collocations Dictionary as the gold standard show that the integrated measure (LMP) we propose outperforms the other 3 AMs. The syntactic constraints in Phase-II filter out much noise from surface co-occurrences, the semantic constraints at Phase-III are effective in identifying the very "core" collocations, and the keyness of the words on the test set is a significant factor when a published collocation dictionary is taken as the gold standard. The tool can be a convenient aid for linguists and language teachers and learners.
机译:我们根据搭配的三层语言属性设计了一个分层搭配提取工具。根据搭配的结构化定义,提取过程分为三个阶段:i)从依赖三元组中提取频率层中的外围搭配,ii)通过关联度量(AM)提取语法层中的半外围搭配,iii)提取核心在语义层中搭配类似的词库。通过将单词的所有搭配作为其特征并计算任意两个单词之间的相似度来创建同义词库。以《牛津搭配词典》为金本位的《中国英语》考试语料库的实验表明,我们提出的综合测评(LMP)优于其他三个测验。第二阶段的句法约束可从表面共现中滤除大量噪声,第三阶段的语义约束可有效地识别非常“核心”的搭配,而测试集上单词的关键性则是一个重要因素。已发布的搭配词典被视为黄金标准。该工具可以为语言学家以及语言老师和学习者提供便利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号