Enriching Document Representation with the Deviations of Word Co-occurrence Frequencies

机译：通过单词共同发生频率的偏差来丰富文献表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent strategies have been proposed to reveal the semantic relatedness between documents by enriching a document with the relatedness of all the words in the given document collection to the document. By restricting the relatedness to the expected frequencies that each word will occur in the document, the traditional weighted sum of word vectors is proved to give the upper bounds of the expected frequencies. Duplicate counts usually exist during the sum of the word vectors, which weaken the discriminativeness of the enriched document vectors. The strategy which gives the lower bounds of the expected frequencies is also obtained by keeping the maximum values of the word vectors on each dimension. Together with the lower bounds and the deviations of word co-occurrence frequencies, a novel method is proposed to remove the duplicate counts existing in the upper bounds. As a result, the proposed method smooths the generated document vectors better than the weighted sum strategy. Substantial experiments verify that the document clustering incorporated with the proposed method achieves a significant performance improvement compared with the existing strategies.

机译：最近的策略已经提出通过丰富文件与文件收集中所有单词的相关性来揭示文件之间的语义相关性。通过将每个单词发生在文档中的预期频率的相关性，证明了传统的单词向量的加权和提供了预期频率的上限。在单词向量之和中通常存在重复计数，这削弱了丰富的文献向量的歧视。通过保持每个维度上的单词向量的最大值，还可以获得给出预期频率下限的策略。提出了一种新的方法，提出了一种新方法来消除上限中存在的重复计数。结果，所提出的方法比加权和策略更好地平滑所生成的文档向量。实质性实验验证了与所提出的方法的文档聚类与现有策略相比实现了显着的性能改进。

著录项

来源
《International Conference on Algorithms and Architectures for Parallel Processing》|2015年||共14页
会议地点
作者
Yang Wei; Jinmao Wei; Zhenglu Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-53;
关键词
Document representation; Document enrichment; Word co-occurrences; Word relatedness; Document clustering;

机译：文件表示;文件丰富;词共同;词相关性;文档聚类;

相似文献

外文文献
中文文献
专利

1. 单词复现频率与语境丰富度对中国英语学习者阅读中词汇知识习得的影响 [J] . 孙海妹中国应用语言学：英文版 . 2014,第001期
2. Single document keyword extraction via quantifying higher-order structural features of word co-occurrence graph [J] . Chen Yan, Wang Jie, Li Ping, Computer speech and language . 2019,第SEPa期

机译：通过量化单词共现图的高阶结构特征提取单文档关键词
3. Single document keyword extraction via quantifying higher-order structural features of word co-occurrence graph [J] . Chen Yan, Wang Jie, Li Ping, Computer speech and language . 2019,第Sepa期

机译：单个文档关键字提取通过量化Word Co-antionrence图的高阶结构特征
4. KEYWORD EXTRACTION FROM A SINGLE DOCUMENT USING WORD CO-OCCURRENCE STATISTICAL INFORMATION [J] . Y. MATSUO, M. ISHIZUKA International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2004,第1期

机译：使用单词同现统计信息从单个文档中提取关键词
5. Enriching Document Representation with the Deviations of Word Co-occurrence Frequencies [C] . Yang Wei, Jinmao Wei, Zhenglu Yang International conference on algorithms and architectures for parallel processing . 2015

机译：利用单词共现频率的偏差丰富文档表示
6. COMPUTER VOICE IDENTIFICATION METHOD BY USING INTENSITY DEVIATION SPECTRA AND FUNDAMENTAL FREQUENCY CONTOUR (CLUSTERING, TEXT-INDEPENDENT, FREQUENCY DISTORTION) [D] . NAKASONE, HIROTAKA 1984

机译：利用强度偏差谱和基本频率轮廓（聚类，文本无关，频率失真）的计算机语音识别方法
7. Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis [O] . Darko Brodić, Zoran N. Milivojević, Čedomir A. Maluckov 2013

机译：使用频率出现和共现分析来识别塞尔维亚文档中的脚本
8. Keyword Extraction using the Word Co-occurrence Network Properties that is Independent of Languages and Document Types and Its Evaluation by Prediction of Headline Words [O] . Yuki YAMAMOTO, Ryohei ORIHARA 2009

机译：关键字提取使用与语言和文档类型无关的单词共同发生网络属性及其通过预测标题字的评估

Enriching Document Representation with the Deviations of Word Co-occurrence Frequencies

摘要

著录项

相似文献

相关主题

期刊订阅