首页> 外文会议>International Symposium on Intelligent Systems and Applications >An automated domain specific stop word generation method for natural language text classification
【24h】

An automated domain specific stop word generation method for natural language text classification

机译:自动语言文本分类的自动域特定停止字生成方法

获取原文

摘要

In this paper we propose an automated method for generating domain specific stop words to improve classification of natural language content. Also we implemented a bayesian natural language classifier working on web pages, which is based on maximum a posteriori probability estimation of keyword distributions using bag-of-words model to test the generated stop words. We investigated the distribution of stop-word lists generated by our model and compared their contents against a generic stop-word list for English language. We also show that the document coverage rank and topic coverage rank of words belonging to natural language corpora follow Zipf's law, just like the word frequency rank is known to follow.
机译:在本文中,我们提出了一种自动化方法,用于生成域特定的停止单词以改善自然语言内容的分类。 此外,我们还在Web页面上实施了贝叶斯自然语言分类器,其基于使用袋式模型来测试生成的停止单词的关键字分布的最大后验概率估计。 我们调查了我们模型生成的止血列表的分布,并将其内容与英语通用止动列表进行了比较。 我们还表明,文档覆盖范围和主题覆盖范围属于自然语言语境的单词遵循ZIPF的定律,就像已知频率等级一样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号