首页> 外文OA文献 >Online forum thread retrieval using data fusion
【2h】

Online forum thread retrieval using data fusion

机译:使用数据融合的在线论坛主题检索

摘要

Online forums empower people to seek and share information via discussion threads. However, finding threads satisfying a user information need is a daunting task due to information overload. In addition, traditional retrieval techniques do not suit the unique structure of threads because thread retrieval returns threads, whereas traditional retrieval techniques return text messages. A few representations have been proposed to address this problem; and, in some representations aggregating query relevance evidence is an essential step. This thesis proposes several data fusion techniques to aggregate evidence of relevance within and across thread representations. In that regard, this thesis has three contributions. Firstly, this work adapts the Voting Model from the expert finding task to thread retrieval. The adapted Voting Model approaches thread retrieval as a voting process. It ranks a list of messages, then it groups messages based on their parent threads; also, it treats each ranked message as a vote supporting the relevance of its parent thread. To rank parent threads, a data fusion technique aggregates evidence from threads’ ranked messages. Secondly, this study proposes two extensions of the voting model: Top K and Balanced Top K voting models. The Top K model aggregates evidence from only the top K ranked messages from each thread. The Balanced Top K model adds a number of artificial ranked messages to compensate the difference if a thread has less than K ranked messages (a padding step). Experiments with these voting models and thirteen data fusion methods reveal that summing relevance scores of the top K ranked messages from each thread with the padding step outperforms the state of the art on all measures on two datasets. The third contribution of this thesis is a multi-representation thread retrieval using data fusion techniques. In contrast to the Voting Model, data fusion methods were used to fuse several ranked lists of threads instead of a single ranked list of messages. The thread lists were generated by five retrieval methods based on various thread representations; the Voting Model is one of them. The first three methods assume a message to be the unit of indexing, while the latter two assume the title and the concatenation of the thread message texts to be the units of indexing respectively. A thorough evaluation of the performance of data fusion techniques in fusing various combinations of thread representations was conducted. The experimental results show that using the sum of relevance scores or the sum of relevance scores multiplied by the number of retrieving methods to develop multi-representation thread retrieval improves performance and outperforms all individual representations
机译:在线论坛使人们能够通过讨论线程查找和共享信息。但是,由于信息过载,找到满足用户信息需求的线程是一项艰巨的任务。另外,传统的检索技术不适合线程的独特结构,因为线程检索返回线程,而传统的检索技术返回文本消息。已经提出了一些解决这个问题的方法。并且在某些表示形式中,汇总查询相关性证据是必不可少的步骤。本文提出了几种数据融合技术来汇总线程表示之内和之间的相关性证据。在这方面,本论文有三点贡献。首先,这项工作使投票模型从专家查找任务适应线程检索。适应的投票模型将线程检索作为投票过程。它对消息列表进行排序,然后根据消息的父线程对消息进行分组。同样,它将每个已排序的消息视为支持其父线程相关性的投票。为了对父线程进行排名,一种数据融合技术将从线程的排名消息中收集证据。其次,本研究提出了投票模型的两个扩展:Top K和Balanced Top K投票模型。 Top K模型仅汇总每个线程中排名靠前K的消息的证据。如果线程少于K个排序消息,则“平衡的前K个”模型添加许多人工排名的消息以补偿差异(填充步骤)。使用这些投票模型和13种数据融合方法进行的实验表明,将来自每个线程的排名靠前的K条消息与填充步骤的相关性得分相加,在两个数据集的所有度量上均优于最新技术。本文的第三点贡献是使用数据融合技术的多表示线程检索。与投票模型相反,数据融合方法用于融合多个排序的线程列表,而不是单个排序的消息列表。线程列表是通过五种基于各种线程表示的检索方法生成的。投票模型就是其中之一。前三种方法假定消息是索引的单位,而后两种方法分别将线程消息文本的标题和串联作为索引的单位。对数据融合技术在融合线程表示的各种组合中的性能进行了彻底的评估。实验结果表明,使用相关分数总和或相关分数总和乘以检索方法的数量来开发多表示线程检索可以提高性能,并且胜过所有单个表示

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号