首页> 外文会议>International Conference on Information and Communication Technology >Co-Occurrence Technique and Dictionary Based Method for Indonesian Thesaurus Construction
【24h】

Co-Occurrence Technique and Dictionary Based Method for Indonesian Thesaurus Construction

机译:印度尼西亚词库建设的共同发生技术与基于词典的方法

获取原文

摘要

Thesaurus as control vocabulary can be an important tool in Natural Language Processing (NLP). However, constructing a thesaurus manually by experts can be time consuming. Besides that the subjectivity of each expert can affect the structure of the thesaurus. A lot of method has already been implemented to build an automatic thesaurus in languages that categorized as rich language resources. In poor language resources such as Indonesia, the research about this field is still limited. This paper proposed a framework to construct a thesaurus in Indonesian language using monolingual corpus. The method will use Indonesian dictionary and large monolingual corpus from news articles. The candidate related terms will be extracted from every resource, then the two candidate will produce the final result of thesaurus. The evaluation is done by using the thesaurus as QE (Query Expansion) resource in IR (Information Retrieval) system. The experimental results show that using the automatic thesaurus can obtain the precision and recall of the system with 54.00% and 85.42%, respectively.
机译:作为控制词汇的词库可以是自然语言处理中的重要工具(NLP)。然而,由专家手动构建同义词库可能是耗时的。除此之外,每个专家的主体性可能会影响词库的结构。已经实施了许多方法以构建分类为丰富语言资源的语言的自动词库。在印度尼西亚等糟糕的语言资源中,关于该领域的研究仍然有限。本文提出了一个框架,用于使用单语语料库构建印度尼西亚语言中的词库。该方法将使用新闻文章中的印尼词典和大型单声道语料库。候选相关条款将从每个资源中提取,然后两名候选人将产生词库的最终结果。评估是通过使用IR(信息检索)系统中的QE(查询扩展)资源的QEARURUS来完成的。实验结果表明,使用自动词库可以分别获得54.00%和85.42%的系统精度和召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号