AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING

机译：使用词簇自动构建词库

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new clustering algorithm for large scale document size to construct the thesaurus automatically in aid of summarization. The existing word-clustering systems use various similarity and clustering algorithm based on the context of the information retrieval. In case of the clustering using term-document matrix, the distribution of the index word represents the frequency of the word appearance in a certain contents of a document. Therefore, semantic relation between these words in the document is not so strong. As a result, the words which appear frequently in the contents tend to be gathered for one cluster. To construct a cluster set in which semantic relation between these words is contained, we show a word clustering using a pair of words with cooccurrence relation automatically. We further show that our clustering is effective for word sense disambiguation in comparison with using term-document matrix.

机译：在本文中，我们提出了一种针对大规模文档大小的新聚类算法，以借助摘要自动构建同义词库。现有的词聚类系统基于信息检索的上下文使用各种相似性和聚类算法。在使用术语文档矩阵进行聚类的情况下，索引词的分布表示单词在文档某些内容中出现的频率。因此，文档中这些词之间的语义关系不是那么牢固。结果，在内容中频繁出现的单词趋向于聚集在一簇中。为了构建其中包含这些词之间的语义关系的聚类集，我们展示了使用具有共现关系的一对词自动进行词聚类的过程。我们进一步表明，与使用术语文档矩阵相比，我们的聚类方法可有效消除词义歧义。

著录项

来源
《Pacific Association for Computational Linguistics Conference(PACLING'03); 20030822-25; Halifax(CA)》|2003年|P.55-62|共8页
会议地点 Halifax(CA)
作者
MINORU SASAKI; HIROYUKI SHINNOU;
展开▼
作者单位

Department of Computer and Information Sciences, Faculty of Engineering, Ibaraki University, 4-12-1 Nakanarusawa, Hitachi, Ibaraki, 316-8511, JAPAN;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词
word sense disambiguation; word clustering; thesaurus; vector space model; latent semantic indexing;

机译：词义消歧词聚类词库矢量空间模型潜在语义索引;
入库时间 2022-08-26 14:15:03

相似文献

外文文献
中文文献
专利

1. Joining automatic query expansion based on thesaurus and word sense disambiguation using WordNet [J] . Francisco Joao Pinto, Antonio Farina Martinez, Carme Fernandez Perez-Sanjulian International Journal of Computer Applications in Technology . 2008,第4期

机译：使用词网加入基于词库和词义消歧的自动查询扩展
2. Automatic lexeme acquisition for a multilingual medical subword thesaurus [J] . Kornel Marko, Stefan Schulz, Udo Hahn International journal of medical informatics . 2007,第2a3期

机译：多语言医学子词库的自动词素获取
3. Word association testing and thesaurus construction [J] . louise-f-spiteri Libres: Library and Information Science Research Electronic Journal . 2014,第2期

机译：单词联想测试和词库构建
4. AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING [C] . MINORU SASAKI, HIROYUKI SHINNOU, Pacific Association for Computational Linguistics Pacific Association for Computational Linguistics Conference . 2003

机译：使用Word Clastering的自动叙述施工
5. Automatic Supervised Thesauri Construction with "Roget's Thesaurus". [D] . Kennedy, Alistair. 2012

机译：具有“ Roget词库”的自动监督词库构建。
6. Chronological corpora curve clustering: From scientific corpora construction to knowledge dynamics discovery through word life-cycles clustering [O] . Matilde Trevisani, Arjuna Tuzzi 2018

机译：时序语料库曲线聚类：从科学语料库构建到通过单词生命周期聚类的知识动力学发现
7. Hierarchical word clustering - automatic thesaurus generation [O] . Hodge V J, Austin J 2002

机译：分层单词聚类 - 自动同义词库生成

AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING

摘要

著录项

相似文献

相关主题

期刊订阅