AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING

机译：使用Word Clastering的自动叙述施工

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new clustering algorithm for large scale document size to construct the thesaurus automatically in aid of summarization. The existing word-clustering systems use various similarity and clustering algorithm based on the context of the information retrieval. In case of the clustering using term-document matrix, the distribution of the index word represents the frequency of the word appearance in a certain contents of a document. Therefore, semantic relation between these words in the document is not so strong. As a result, the words which appear frequently in the contents tend to be gathered for one cluster. To construct a cluster set in which semantic relation between these words is contained, we show a word clustering using a pair of words with cooccurrence relation automatically. We further show that our clustering is effective for word sense disambiguation in comparison with using term-document matrix.

机译：在本文中，我们提出了一种新的聚类算法，用于大规模文档大小，以便自动构建叙述。现有的单词聚类系统基于信息检索的上下文使用各种相似性和聚类算法。在使用术语 - 文档矩阵的群集的情况下，索引字的分布表示文档的某个内容中的字外观的频率。因此，文档中这些单词之间的语义关系并不是那么强大。结果，频繁出现在内容中的单词往往会收集一个群集。要构建包含这些单词之间的语义关系的群集集，我们将自动使用一对单词的单词显示单词群集。我们进一步表明，与使用术语文档矩阵相比，我们的聚类对于单词感应歧义有效。

著录项

来源
《Pacific Association for Computational Linguistics Conference》|2003年||共8页
会议地点
作者
MINORU SASAKI; HIROYUKI SHINNOU; Pacific Association for Computational Linguistics;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词
word sense disambiguation; word clustering; thesaurus; vector space model; latent semantic indexing;

机译：词感消歧;词聚类;词库;矢量空间模型;潜在语义索引;

相似文献

外文文献
中文文献
专利

1. Joining automatic query expansion based on thesaurus and word sense disambiguation using WordNet [J] . Francisco Joao Pinto, Antonio Farina Martinez, Carme Fernandez Perez-Sanjulian International Journal of Computer Applications in Technology . 2008,第4期

机译：使用词网加入基于词库和词义消歧的自动查询扩展
2. Automatic lexeme acquisition for a multilingual medical subword thesaurus [J] . Kornel Marko, Stefan Schulz, Udo Hahn International journal of medical informatics . 2007,第2a3期

机译：多语言医学子词库的自动词素获取
3. Word association testing and thesaurus construction [J] . louise-f-spiteri Libres: Library and Information Science Research Electronic Journal . 2014,第2期

机译：单词联想测试和词库构建
4. AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING [C] . MINORU SASAKI, HIROYUKI SHINNOU Pacific Association for Computational Linguistics Conference(PACLING'03); 20030822-25; Halifax(CA) . 2003

机译：使用词簇自动构建词库
5. Automatic Supervised Thesauri Construction with "Roget's Thesaurus". [D] . Kennedy, Alistair. 2012

机译：具有“ Roget词库”的自动监督词库构建。
6. Chronological corpora curve clustering: From scientific corpora construction to knowledge dynamics discovery through word life-cycles clustering [O] . Matilde Trevisani, Arjuna Tuzzi 2018

机译：时序语料库曲线聚类：从科学语料库构建到通过单词生命周期聚类的知识动力学发现
7. Hierarchical word clustering - automatic thesaurus generation [O] . Hodge V J, Austin J 2002

机译：分层单词聚类 - 自动同义词库生成

AUTOMATIC THESAURUS CONSTRUCTION USING WORD CLUSTERING

摘要

著录项

相似文献

相关主题

期刊订阅