...
首页> 外文期刊>International Journal of Information Technology and Computer Science >A Domain Specific Key Phrase Extraction Framework for Email Corpuses
【24h】

A Domain Specific Key Phrase Extraction Framework for Email Corpuses

机译:电子邮件语料库的域特定密钥短语提取框架

获取原文
           

摘要

With the growth in the communication over Internet via short messages, messaging services and chat, still emails are the most preferred communication method. Thousands of emails are been communicated everyday over different service providers. The emails being the most effective communication methods can also attract a lot of spam or irrelevant information. The spam emails are annoying and consumes a lot of time for filtering. Regardless to mention, the spam emails also consumes the main allocated inbox space and at the same time causes huge network traffic. The filtration methods are miles away from perfection as most of these filters depends on the standard rules, thus making the valid emails marked as spam. The first step of any email filtration should be extracting the key phrases from the emails and based on the key phrases or mostly used phrases the filters should be activated. A number of parallel researches have demonstrated the key phrase extraction policies. Nonetheless, the methods are truly focused on domain specific corpuses and have not addressed the email corpuses. Thus this work demonstrates the key phrases extraction process specifically for the email corpuses. The extracted key phrases demonstrate the frequency of the words used in that email. This analysis can make the further analysis easier in terms of sentiment analysis or spam detection. Also, this analysis can cater to the need for text summarization. The proposed component based framework demonstrates a nearly 95% accuracy.
机译:随着通过短消息,消息传递服务和聊天的Internet通信的增长,仍然是电子邮件的首选通信方法。每天通过不同的服务提供商传递成千上万的电子邮件。电子邮件是最有效的通信方法,也可以吸引大量垃圾邮件或不相关的信息。垃圾邮件很烦人,并且要花费大量时间进行过滤。无论如何,垃圾邮件还会占用分配的主要收件箱空间,同时会导致巨大的网络流量。过滤方法与完善方法相距甚远,因为大多数过滤器都取决于标准规则,因此会将有效电子邮件标记为垃圾邮件。任何电子邮件过滤的第一步都应该是从电子邮件中提取关键短语,并根据关键短语或最常用的短语来激活过滤器。许多并行研究已经证明了关键词提取策略。但是,这些方法实际上只针对特定领域的语料库,而没有针对电子邮件语料库。因此,这项工作演示了专门针对电子邮件语料库的关键词提取过程。提取的关键短语说明了该电子邮件中使用的单词的频率。这种分析可以使从情绪分析或垃圾邮件检测方面的进一步分析变得更加容易。而且,该分析可以满足文本摘要的需要。所提出的基于组件的框架显示出近95%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号