Collection-based compound noun segmentation for Korean information retrieval

In-Su Kang; Seung-Hoon Na; Jong-Hyeok Lee

首页> 外文期刊>Information retrieval >Collection-based compound noun segmentation for Korean information retrieval

【24h】

Collection-based compound noun segmentation for Korean information retrieval

机译：基于集合的复合名词分割，用于朝鲜语信息检索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Compound noun segmentation is a key first step in language processing for Korean. Thus far, most approaches require some form of human supervision, such as pre-existing dictionaries, segmented compound nouns, or heuristic rules. As a result, they suffer from the unknown word problem, which can be overcome by unsupervised approaches. However, previous unsupervised methods normally do not consider all possible segmentation candidates, and/or rely on character-based segmentation clues such as bi-grams or all-length n-grams. So, they are prone to falling into a local solution. To overcome the problem, this paper proposes an unsupervised segmentation algorithm that searches the most likely segmentation result from all possible segmentation candidates using a word-based segmentation context. As word-based segmentation clues, a dictionary is automatically generated from a corpus. Experiments using three test collections show that our segmentation algorithm is successfully applied to Korean information retrieval, improving a dictionary-based longest-matching algorithm.

机译：复合名词分割是韩语语言处理中的关键第一步。到目前为止，大多数方法都需要某种形式的人工监督，例如预先存在的字典，分段复合名词或启发式规则。结果，他们遭受了未知单词问题的困扰，这可以通过无监督的方法来克服。但是，以前的无监督方法通常不会考虑所有可能的分割候选，并且/或者依赖于基于字符的分割线索，例如二元语法或全长n元语法。因此，他们倾向于陷入本地解决方案。为了解决该问题，本文提出了一种无监督的分割算法，该算法使用基于单词的分割上下文从所有可能的分割候选中搜索最可能的分割结果。作为基于单词的细分线索，从语料库自动生成字典。使用三个测试集合的实验表明，我们的分割算法已成功应用于朝鲜语信息检索，改进了基于字典的最长匹配算法。

著录项

来源
《Information retrieval》 |2006年第5期|p.613-631|共19页
作者
In-Su Kang; Seung-Hoon Na; Jong-Hyeok Lee;
展开▼
作者单位

Division of Electrical and Computer Engineering, Pohang University of Science and Technology (POSTECH), Advanced Information Technology Research Center (AITrc), PIRL 323, San 31, Hyoja-dong, Nam-gu, Pohang, 790-784, Republic of Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
compound noun segmentation; unsupervised method; korean information retrieval;

机译：复合名词分割;无监督方法;韩文信息检索;

相似文献

外文文献
中文文献
专利

1. Compound noun segmentation based on lexical data extracted from corpus [J] . Juntae Yoon Journal of Linguistics . 2001,第2期

机译：基于语料库词汇数据的复合名词分割
2. Noun Sense Identification of Korean Nominal Compounds Based on Sentential Form Recovery [J] . Seong Il Yang, Young Ae Seo, Young Kil Kim, ETRI journal . 2010,第5期

机译：基于句子形式恢复的朝鲜族名词性名词名词意义识别
3. Noun Sense Identification of Korean Nominal Compounds Based on Sentential Form Recovery [J] . Seong Il Yang, Young Ae Seo, Young Kil Kim, ETRI journal . 2010,第5期

机译：基于句子形式恢复的韩国标称化合物的名词感应识别
4. Compound Noun Analysis for Process of Korean Unregistered Word [C] . Jin Guanghe, Li Zhanguo, Qu Dapeng, 2012 Fourth international conference on computational and information sciences . 2012

机译：朝鲜语未注册词处理的复合名词分析
5. Microcognitive analysis of noun-noun compounds in a present-day English lexicon. [D] . Rubio Cuenca, Francisco. 2004

机译：当今英语词典中名词名词化合物的微认知分析。
6. Lexical and Buffer Effects in Reading and in Writing Noun-Noun Compound Nouns [O] . Sara Mondini, Giorgio Arcara, Carlo Semenza 2012

机译：名词和名词复合名词的读写中的词汇和缓冲效应
7. Compound Noun Segmentation Based on Lexical Data Extracted from Corpus [O] . Junrue Yoon 2007

机译：基于语料库提取的词汇数据的复合名词分割

Collection-based compound noun segmentation for Korean information retrieval

摘要

著录项

相似文献

相关主题

期刊订阅