首页> 外文会议>International Conference on Data Management Technologies and Applications >A Novel Method for Unsupervised and Supervised Conversational Message Thread Detection
【24h】

A Novel Method for Unsupervised and Supervised Conversational Message Thread Detection

机译:一种无监督和监督的对话消息线程检测的新方法

获取原文

摘要

Efficiently detecting conversation threads from a pool of messages, such as social network chats, emails, comments to posts, news etc., is relevant for various applications, including Web Marketing, Information Retrieval and Digital Forensics. Existing approaches focus on text similarity using keywords as features that are strongly dependent on the dataset. Therefore, dealing with new corpora requires further costly analyses conducted by experts to find out new relevant features. This paper introduces a novel method to detect threads from any type of conversational texts overcoming the issue of previously determining specific features for each dataset. To automatically determine the relevant features of messages we map each message into a three dimensional representation based on its semantic content, the social interactions in terms of sender/recipients and its timestamp; then clustering is used to detect conversation threads. In addition, we propose a supervised approach to detect conversation threads that builds a classification model which combines the above extracted features for predicting whether a pair of messages belongs to the same thread or not. Our model harnesses the distance measure of a message to a cluster representing a thread to capture the probability that a message is part of that same thread. We present our experimental results on seven datasets, pertaining to different types of messages, and demonstrate the effectiveness of our method in the detection of conversation threads, clearly outperforming the state of the art and yielding an improvement of up to a 19%.
机译:有效地检测来自一条消息池的对话线程,例如社交网络聊天,电子邮件,发布,新闻等,与各种应用相关,包括网络营销,信息检索和数字取证。现有方法专注于使用关键字作为强烈依赖于数据集的功能的文本相似性。因此,处理新的Corpora需要通过专家进行进一步的昂贵分析,以找到新的相关功能。本文介绍了一种从任何类型的会话文本中检测线程的新方法,克服了先前确定每个数据集的特定功能的问题。自动确定消息的相关特征,我们将每个消息映射到三维表示基于其语义内容,在发件人/收件人及其时间戳方面的社交交互;然后群集用于检测对话线程。此外,我们提出了一种监督方法来检测构建的对话线程,该对话线程构建组合上述提取特征的分类模型,以预测一对消息是否属于相同的线程。我们的模型利用表示表示线程的群集的消息的距离测量,以捕获消息是消息是该相同线程的一部分的概率。我们在七个数据集中介绍了我们的实验结果,与不同类型的信息有关,并展示了我们在检测对话线程中的方法的有效性,显然优于现有技术并产生高达19%的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号