【24h】

Taxonomy Building by Divide-and-Conquer Method

机译:分而治之的分类学构建

获取原文
获取原文并翻译 | 示例

摘要

This paper describes new thesaurus construction method in which class-based, small size thesauruses are constructed and merged as a whole based on domain classification system. This method has advantages in that 1) taxonomy construction complexity is reduced, 2) each class-based thesaurus can be reused in other domain thesaurus, and 3) term distribution per classes in target domain is easily identified. The method is composed of three steps: term extraction step, term classification step, and taxonomy construction step. All steps are balanced approaches of automatic processing and manual verification. We constructed Korean IT domain thesaurus based on proposed method. Because terms are extracted from Korean newspaper and patent corpus in IT domain, the thesaurus includes many neologisms created in Korea. The thesaurus consists of 81 upper level classes and over 1,000 IT terms.
机译:本文介绍了一种新的同义词库构建方法,该方法基于领域分类系统构建基于类的小型同义词库,并将其合并为一个整体。该方法的优点在于:1)降低了分类结构的复杂性; 2)每个基于类的同义词库可以在其他领域的同义词库中重用,并且3)易于识别目标领域中每个类的术语分布。该方法包括三个步骤:术语提取步骤,术语分类步骤和分类法构建步骤。所有步骤都是自动处理和手动验证的平衡方法。我们基于提出的方法构建了韩国IT领域词库。由于术语是从IT领域的韩国报纸和专利语料库中提取的,因此同义词库包含了许多在韩国创建的新词。词库由81个上层课程和1000多个IT术语组成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号