首页> 外国专利> Analysis of emails using a hidden Markov model to recognize sections of the email, e.g. header, body, signature block and disclaimer

Analysis of emails using a hidden Markov model to recognize sections of the email, e.g. header, body, signature block and disclaimer

机译:使用隐藏的马尔可夫模型分析电子邮件以识别电子邮件的各个部分,例如标头,正文,签名块和免责声明

摘要

An automated parser for e-mail messages identifies component parts such as header, body, signature, and disclaimer. The parser uses a hidden Markov model (HMM) in which the lines making up an e mail are treated as a sequence of observations of a system that evolves according to a Markov chain having states corresponding to the component parts. The HMM is trained using a manually-annotated set of e-mail messages, then applied to parse other e-mail messages. HMM-based parsing can be further refined or expanded using heuristic post-processing techniques that exploit redundancy of some component parts (e.g., signatures, disclaimers) across a corpus of e-mail messages, e.g. to cluster email according to the similarity of signature block and to compare the similarly within clusters to find a representative signature for each cluster.
机译:电子邮件的自动解析器可识别组成部分,例如标题,正文,签名和免责声明。解析器使用隐藏的马尔可夫模型(HMM),在该模型中,构成电子邮件的行被视为系统的观察序列,该系统根据具有对应于组成部分的状态的马尔可夫链进行演化。使用一组手动注释的电子邮件来训练HMM,然后将其应用于解析其他电子邮件。可以使用启发式后处理技术进一步完善或扩展基于HMM的解析,该技术利用整个电子邮件消息集中某些组成部分(例如签名,免责声明)的冗余。根据签名块的相似性对电子邮件进行聚类,并在聚类中进行相似比较以找到每个聚类的代表签名。

著录项

  • 公开/公告号GB2496120A

    专利类型

  • 公开/公告日2013-05-08

    原文格式PDF

  • 申请/专利权人 STRATIFY INC.;

    申请/专利号GB20110018726

  • 发明设计人 VAMSI SALAKA;JOY THOMAS;

    申请日2011-10-31

  • 分类号G06Q10/10;G06K9/62;H04L12/58;

  • 国家 GB

  • 入库时间 2022-08-21 16:20:22

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号