首页> 外文会议>IEEE International Conference on Data Mining Workshops >Activity Detection from Email Meta-data Clustering
【24h】

Activity Detection from Email Meta-data Clustering

机译:电子邮件元数据群集的活动检测

获取原文

摘要

Information workers in a large enterprise often deal with large volumes of e-mail traffic every day. In such a scenario, automatic detection of activities that they are involved in has many potential uses, and even presenting users with a summary of their current set of activities was found to be of value in itself. In this paper, we describe the problem of automatically detecting user activities from e-mails, while using only meta-data of e-mails, i.e., we do not process email contents. We present a novel two stage algorithm for automatic activity detection from users' e-mails: We first represent the e-mail dataset as a rectangular matrix using features such as other e-mails, people involved, and names of the documents attached in the e-mails. We next represent the emails in latent feature space using SVD, followed by further dimensionality reduction using t-Distributed Stochastic Neighbor embedding(t-SNE). We then cluster e-mails using density based clustering algorithm in t-SNE space. In the second stage we merge these clusters based on group properties and a community detection algorithm on the graph of clusters, to yield our set of automatically detected activities. We analyse public e-mail datasets and present benchmarks of our approach on real-life datasets collected from our target users, and also compare our algorithm with alternative approaches as well as those published in recent literature.
机译:大型企业的信息工人每天常常处理大量的电子邮件流量。在这种情况下,自动检测它们涉及的活动有很多潜在的用途,甚至发现用户概述他们当前的一组活动的摘要是价值本身的价值。在本文中,我们描述了从电子邮件中自动检测用户活动的问题,而仅使用电子邮件的元数据,即,我们不会处理电子邮件内容。我们提出了一种新的两个阶段算法,用于用户电子邮件的自动活动检测:首先使用诸如其他电子邮件,涉及的人员的功能和附加文档的名称来表示电子邮件数据集作为矩形矩阵电子邮件。接下来,使用SVD代表潜在特征空间中的电子邮件,然后使用T分布式随机邻居嵌入(T-SNE)进一步减少重维数。然后,我们在T-SNE空间中使用基于密度的聚类算法群集电子邮件。在第二阶段,我们基于组属性和集群图的社区检测算法合并这些群集,从而产生我们的自动检测活动。我们分析了从我们的目标用户收集的现实生活数据集上的公共电子邮件数据集,并在从我们的目标用户收集的现实数据集中提供基准,并将我们的算法与替代方法以及最近文献中发表的算法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号