首页> 外文会议>Web technologies and applications >Boosting Explicit Semantic Analysis by Clustering Paragraph Vectors of Wikipedia Articles
【24h】

Boosting Explicit Semantic Analysis by Clustering Paragraph Vectors of Wikipedia Articles

机译:通过对维基百科文章的段落向量进行聚类来促进显式语义分析

获取原文
获取原文并翻译 | 示例

摘要

Explicit Semantic Analysis (ESA) is an effective method that utilizes Wikipedia entries (articles) to represent text and compute semantic relatedness (SR) for text pairs. Analogous to ordinary web search techniques, ESA also suffers from the redundancy issues due to the ongoing expansion of the amount of Wikipedia entries. Entries redundancy could lead to biased representation that lay particular emphasis on semantics from a large number of similar entries. On the other hand, original ESA for SR has a weak point that it does not consider the correlations or similarities between the Wikipedia articles of the text representations. To tackle these problems, We develop a novel method to cluster the redundant or similar entries by similarity measurement based on Paragraph Vector (PV), a neural network language model. Results of experiments on four datasets show that our framework could gain better performance in relatedness accuracy against ESA.
机译:显式语义分析(ESA)是一种有效的方法,该方法利用Wikipedia条目(文章)来表示文本并计算文本对的语义相关性(SR)。与普通的Web搜索技术类似,由于Wikipedia条目数量的不断增加,ESA也遭受了冗余问题。条目冗余可能会导致有偏见的表示形式,这种表示形式特别强调来自大量相似条目的语义。另一方面,用于SR的原始ESA具有一个弱点,即它不考虑文本表示形式的Wikipedia文章之间的相关性或相似性。为了解决这些问题,我们开发了一种新的方法,以基于神经网络语言模型的段落向量(PV)进行相似度度量来对冗余项或相似项进行聚类。在四个数据集上进行的实验结果表明,我们的框架在针对ESA的关联性准确性方面可以获得更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号