【24h】

Simultaneously modeling semantics and structure of threaded discussions

机译:同时建模主题讨论的语义和结构

获取原文

摘要

The huge amount of knowledge in web communities has motivated the research interests in threaded discussions. The dynamic nature of threaded discussions poses lots of challenging problems for computer scientists. Although techniques such as semantic models and structural models have been shown to be useful in a number of areas, they are inefficient in understanding threaded discussions due to three reasons: (I) as most of users read existing messages before posting, posts in a discussion thread are temporally dependent on the previous ones; It causes the semantics and structure to be coupled with each other in threaded discussions; (II) in online discussion threads, there are a lot of junk posts which are useless and may disturb content analysis; and (III) it is very hard to judge the quality of a post. In this paper, we propose a sparse coding-based model named SMSS to Simultaneously Model Semantics and Structure of threaded discussions. The model projects each post into a topic space, and approximates each post by a linear combination of previous posts in the same discussion thread. Meanwhile, the model also imposes two sparse constraints to force a sparse post reconstruction in the topic space and a sparse post approximation from previous posts. The sparse properties effectively take into account the characteristics of threaded discussions. Towards the above three problems, we demonstrate the competency of our model in three applications: reconstructing reply structure of threaded discussions, identifying junk posts, and finding experts in a given board/sub-board in web communities. Experimental results show encouraging performance of the proposed SMSS model in all these applications.
机译:网络社区中的大量知识激发了螺纹讨论中的研究兴趣。讨论的动态本质给计算机科学家带来了许多具有挑战性的问题。尽管已显示诸如语义模型和结构模型之类的技术在许多领域都非常有用,但是由于以下三个原因,它们在理解主题讨论方面效率低下:(I)由于大多数用户在发布之前阅读了现有消息,因此在讨论中发表线程在时间上取决于先前的线程;它使语义和结构在多线程讨论中相互结合; (II)在在线讨论线程中,有很多垃圾帖子是无用的,可能会干扰内容分析; (三)很难判断一个职位的质量。在本文中,我们提出了一种基于稀疏编码的名为SMSS的模型,以同时对线程讨论的语义和结构进行建模。该模型将每个帖子投影到主题空间中,并通过同一讨论线程中以前的帖子的线性组合来近似每个帖子。同时,该模型还施加了两个稀疏约束,以强制在主题空间中进行稀疏的帖子重建,以及对先前帖子进行稀疏的帖子近似。稀疏属性有效地考虑了主题讨论的特征。针对上述三个问题,我们在三个应用程序中证明了我们模型的能力:重构线程讨论的回复结构,识别垃圾帖子以及在网络社区的给定董事会/子董事会中寻找专家。实验结果表明,所提出的SMSS模型在所有这些应用中均具有令人鼓舞的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号