首页> 外文会议>IEEE International Conference on Data Mining Workshops >Activity Detection from Email Meta-Data Clustering
【24h】

Activity Detection from Email Meta-Data Clustering

机译:通过电子邮件元数据聚类进行活动检测

获取原文

摘要

Information workers in a large enterprise often deal with large volumes of e-mail traffic every day. In such a scenario, automatic detection of activities that they are involved in has many potential uses, and even presenting users with a summary of their current set of activities was found to be of value in itself. In this paper, we describe the problem of automatically detecting user activities from e-mails, while using only meta-data of e-mails, i.e., we do not process email contents. We present a novel two stage algorithm for automatic activity detection from users' e-mails: We first represent the e-mail dataset as a rectangular matrix using features such as other e-mails, people involved, and names of the documents attached in the e-mails. We next represent the emails in latent feature space using SVD, followed by further dimensionality reduction using t-Distributed Stochastic Neighbor embedding(t-SNE). We then cluster e-mails using density based clustering algorithm in t-SNE space. In the second stage we merge these clusters based on group properties and a community detection algorithm on the graph of clusters, to yield our set of automatically detected activities. We analyse public e-mail datasets and present benchmarks of our approach on real-life datasets collected from our target users, and also compare our algorithm with alternative approaches as well as those published in recent literature.
机译:大型企业中的信息工作者通常每天都要处理大量的电子邮件流量。在这种情况下,自动检测他们所参与的活动有许多潜在用途,甚至向用户呈现其当前活动摘要也被认为具有价值。在本文中,我们描述了自动检测电子邮件中的用户活动,而仅使用电子邮件的元数据的问题,即,我们不处理电子邮件内容。我们提出了一种新颖的两阶段算法,用于从用户的电子邮件中自动检测活动:我们首先使用其他电子邮件,所涉及的人员以及附加在文档中的文档名称等功能,将电子邮件数据集表示为矩形矩阵。电子邮件。接下来,我们使用SVD表示潜在特征空间中的电子邮件,然后使用t分布随机邻居嵌入(t-SNE)进一步降低维度。然后,我们在t-SNE空间中使用基于密度的聚类算法对电子邮件进行聚类。在第二阶段,我们基于组属性和群集图上的社区检测算法合并这些群集,以生成我们的一组自动检测到的活动。我们分析了公共电子邮件数据集,并针对从目标用户那里收集到的真实数据集提出了我们的方法基准,并且将我们的算法与其他方法以及最近发表的文献进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号