首页> 外文会议>Workshop on noisy user-generated text >No, you're not alone: A better way to find people with similar experiences on Reddit
【24h】

No, you're not alone: A better way to find people with similar experiences on Reddit

机译:不,你并不孤单:找到在Reddit上找到类似经验的人

获取原文

摘要

We present a probabilistic clustering algorithm that can help Reddit users to find posts that discuss experiences similar to their own. This model is built upon the BF.RT Next Sentence Prediction model and reduces the time complexity for clustering all posts in a corpus from O(n~2) to O(n) with respect to the number of posts. We demonstrate that such probabilistic clustering can yield a performance better than baseline clustering methods based on Latent Dirichlet Allocation (Blei et al.. 2003) and Word2Vec (Mikolov et al., 2013). Furthermore, there is a high degree of coherence between our probabilistic clustering and the exhaustive comparison O(n~2) algorithm in which the similarity between every pair of posts is found. This makes the use of the BERT Next Sentence Prediction model more practical for unsupervised clustering tasks due to the high runtime overhead of each BERT computation.
机译:我们提出了一种概率聚类算法,可以帮助Reddit用户查找讨论与自己类似的经验的帖子。该模型基于BF.RT下一个句子预测模型构建,并减少了在从O(n〜2)到O(n)的语料库中聚类的时间复杂度,相对于帖子的数量。我们证明,这种概率聚类可以比基于潜在的Dirichlet分配(Blei等人2003)和Word2Vec(Mikolov等,2013)的基线聚类方法更好地产生性能。此外,我们的概率聚类与穷举比较O(n〜2)算法之间存在高度的相干性,其中找到了每对帖子之间的相似性。这使得使用BERT下一句预测模型更实用的是由于每个BERT计算的高运行时开销导致无监督的聚类任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号