首页> 中文期刊>计算机科学 >基于邮件正文的邮箱用户别名抽取

基于邮件正文的邮箱用户别名抽取

     

摘要

邮箱用户身份信息挖掘是数据挖掘研究的一个热点.当前相关研究大多仅从邮件头中抽取邮箱用户的别名,遗漏了邮件正文中潜藏的更能代表通信双方身份的别名信息.针对纯文本邮件正文中邮箱用户别名信息抽取问题,提出了基于统计和规则过滤的称呼块和签名决定位算法,该算法能高效准确地从邮件正文中提取出蕴涵邮箱用户别名的称呼块和签名块文本片段;进一步提出了基于别名边界词汇模板修正的别名抽取方法,从而提高了仅基于命名实体识别或词性标注工具识别别名的准确率.实验结果表明,提出的方法可以有效地抽取出邮件正文中邮箱用户的别名.%Mining user identity information from emails is an important research topic in data mining. Most approaches extract users' names only from the email headers,but names appearing in email bodies are usually more suitable for representing the sender's or recipient's identity. This paper focused on extracting users' name aliases in the body of plain-text emails. Firstly,to effectively elicit salutation and signature block from email bodies,a salutation and signature blocks locating algorithm based on statistical and rules restricted methods was proposed. Then to extract all valid aliases in the salutation and signature lines,a novel approach was proposed based on name boundary word template built on the characteristics of alias neighboring words, which can verify and amend aliases identified by named entity recognition or part-of-speech tagging tools. Results on Enron corpus indicate that the approaches proposed can efficiently and automatically extract user's aliases from email bodies.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号