首页> 外文期刊>Computational Intelligence >Automatic Detection of Words Associations in Texts Based on Joint Distribution of Words Occurrences
【24h】

Automatic Detection of Words Associations in Texts Based on Joint Distribution of Words Occurrences

机译:基于单词出现联合分布的文本中单词联想自动检测

获取原文
获取原文并翻译 | 示例

摘要

In this article, we propose a novel approach for measuring word association based on the joint occurrences distribution in a text. Our approach relies on computing a sum of distances between neighboring occurrences of a given word pair and comparing it with a vector of randomly generated occurrences. The idea behind this assumption is that if the distribution of co-occurrences is close to random or if they tend to appear together less frequently than by chance, such words are not semantically related. We devise a distance function S that evaluates the words association rate. Using S, we build a concept tree, which provides a visual and comprehensive representation of keywords association in a text. In order to illustrate the effectiveness of our algorithm, we apply it to three different texts, showing the consistency and significance of the obtained results with respect to the semantics of documents. Finally, we compare the results obtained by applying our proposed algorithm with the ones achieved by both human experts and the co-occurrence correlation method. We show that our method is consistent with the experts' evaluation and outperforms with respect to the co-occurrence correlation method.
机译:在本文中,我们提出了一种基于文本中联合出现的分布来测量单词联想的新颖方法。我们的方法依赖于计算给定单词对的相邻出现之间的距离之和,并将其与随机产生的出现的向量进行比较。该假设背后的想法是,如果同时出现的分布接近随机,或者如果它们同时出现的频率小于偶然出现的频率,则这些词在语义上不相关。我们设计了一个距离函数S来评估单词关联率。使用S,我们构建了一个概念树,该树提供了文本中关键字关联的可视化和全面表示。为了说明我们算法的有效性,我们将其应用于三种不同的文本,显示了所获得结果相对于文档语义的一致性和重要性。最后,我们将通过应用我们提出的算法获得的结果与人类专家和共现相关方法获得的结果进行比较。我们表明,我们的方法与专家的评估一致,并且在共现关联法方面优于大市。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号