首页> 外文会议>Proceeding of the First international conference on multimodal interfaces(ICMI'96) >Automatic Indexing System with Compound Noun Decomposition and Correction of Word Boundary Error in Real Korean Text
【24h】

Automatic Indexing System with Compound Noun Decomposition and Correction of Word Boundary Error in Real Korean Text

机译:带有复合名词分解的自动索引系统和朝鲜语文本中的单词边界错误校正

获取原文
获取原文并翻译 | 示例

摘要

For the effective automatic indexing of documents, proper keywords should be extracted from documents in which they occur. In Korean text, the keyword extraction has such three key issues as noun extraction, compound noun decomposition, and word-boundary error handling. In most previous researches, each of the issues has been so independently handled using a different method that it would be difficult to effectively solve the problems in a single integrated system. In this paper we suggest an integrated method to handle the three issues, in which the correction of word boundary errors plays an essential role because it can make the other problems easier. The proposed indexing method is based on the CYK algorithm for segmentation and the Viterbi algorithm for word boundary checking. And also, instead of corpus-based statistical information, it just relies on some heuristics for high accuracy in a real text. The experimental evaluation showed a promising result with the indexing recall and precision of 94.44% and 92.83%, respectively.
机译:为了有效地自动索引文档,应从出现它们的文档中提取适当的关键字。在韩语文本中,关键字提取具有名词提取,复合名词分解和词边界错误处理等三个关键问题。在大多数以前的研究中,每个问题都是使用不同的方法如此独立地处理的,以至于很难在单个集成系统中有效地解决问题。在本文中,我们提出了一种综合的方法来处理这三个问题,其中字边界错误的纠正起着至关重要的作用,因为它可以使其他问题变得更容易。提出的索引方法基于用于分割的CYK算法和用于词边界检查的Viterbi算法。而且,它不是基于语料库的统计信息,而是仅依靠某些启发式方法在真实文本中实现高精度。实验评估显示了令人满意的结果,索引召回率和精确度分别为94.44%和92.83%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号