An incremental construction method of a large-scale thesaurus using co-occurrence information

Kazuhiro Morita; Hiroya Kitagawa; Masao Fuketa; Jun-ichi Aoe

首页> 外文期刊>International Journal of Computer Applications in Technology >An incremental construction method of a large-scale thesaurus using co-occurrence information

【24h】

An incremental construction method of a large-scale thesaurus using co-occurrence information

机译：使用共现信息的大型叙词表的增量构建方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A thesaurus is one of important knowledge in natural language processing and is manually made in general. However, as growth of the scale, frequent update is difficult because it takes huge time by hand. This paper aims to construct a hierarchical large-scale thesaurus by a clustering scheme based on co-occurrence information among words. In the proposed clustering algorithm, the Kullback-Leibler divergence is introduced as a similarity measurement in order to judge superordinate and subordinate relations. Besides, the thesaurus tree can be incrementally updated in each node for a minute change such as the addition of unknown words. In order to evaluate the presented method, a thesaurus consisting of about 60,000 words is made by using about 16 million co-occurrence relationships extracted from the Google N-gram. From random data in the thesaurus, it turns out that the proposed method for a large-scale thesaurus achieves high precision of 0.826.

机译：同义词库是自然语言处理中的重要知识之一，通常是人工制作的。但是，随着规模的增长，很难进行频繁的更新，因为这需要花费大量时间。本文旨在通过基于词间共现信息的聚类方案，构建分层的大型词库。在提出的聚类算法中，引入了Kullback-Leibler散度作为相似性度量，以判断上下级关系。此外，同义词库树可以在每个节点中进行增量更新，以进行微小的更改，例如添加未知单词。为了评估所提出的方法，使用从Google N-gram提取的大约1600万个共现关系，制作了一个包含约60,000个单词的同义词库。从同义词库中的随机数据可以看出，所提出的大规模同义词库方法可实现0.826的高精度。

著录项

来源
《International Journal of Computer Applications in Technology》 |2013年第2期|共10页
作者
Kazuhiro Morita; Hiroya Kitagawa; Masao Fuketa; Jun-ichi Aoe;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词
Thesaurus; Co-occurrence relationships; Clustering; Similarity measurement;

机译：词库;共现关系;聚类;相似性度量;

相似文献

外文文献
中文文献
专利

1. An incremental construction method of a large-scale thesaurus using co-occurrence information [J] . Kazuhiro Morita, Hiroya Kitagawa, Masao Fuketa, International Journal of Computer Applications in Technology . 2013,第2期

机译：使用共现信息的大型叙词表的增量构建方法
2. Spectral Methods for Thesaurus Construction [J] . Nobuyuki SHIMIZU, Masashi SUGIYAMA, Hiroshi NAKAGAWA IEICE Transactions on Information and Systems . 2010,第6期

机译：词库构建的光谱方法
3. Methodology for construction of conceptual thesaurus: categorization as a theoretical principle [J] . Campos Maria Luiza Almeida, Gomes Hagar Espanha, UFF Perspectivas em Ciencia da Informacao . 2006,第3期

机译：概念同义词库的构建方法：归类为一种理论原理
4. Co-Occurrence Technique and Dictionary Based Method for Indonesian Thesaurus Construction [C] . Rizka W. Sholikah, Agus Zainal Arifin, Diana Purwitasari, International Conference on Information and Communication Technology . 2017

机译：印度尼西亚词库建设的共同发生技术与基于词典的方法
5. Automatic Supervised Thesauri Construction with "Roget's Thesaurus". [D] . Kennedy, Alistair. 2012

机译：具有“ Roget词库”的自动监督词库构建。
6. Construction of DNA-Shuffled and Incrementally Truncated Libraries by a Mutagenic and Unidirectional Reassembly Method: Changing from a Substrate Specificity of Phospholipase to That of Lipase [O] . Jae Kwang Song, Bora Chung, Young Hak Oh, 2002

机译：突变和单向重组方法构建的DNA改组和递增截断的文库：从磷脂酶的底物特异性变为脂肪酶的底物特异性。
7. 1 Spectral Methods for Thesaurus Construction [O] . Nobuyuki Shimizu, Masashi Sugiyama 2013

机译：1词库构建的谱方法

An incremental construction method of a large-scale thesaurus using co-occurrence information

摘要

著录项

相似文献

相关主题

期刊订阅