首页> 外文会议>Asia Information Retrieval Symposium(AIRS 2005); 20051013-15; Jeju Island(KR) >Filtering Contents with Bigrams and Named Entities to Improve Text Classification
【24h】

Filtering Contents with Bigrams and Named Entities to Improve Text Classification

机译:使用双字母组和命名实体过滤内容以改善文本分类

获取原文
获取原文并翻译 | 示例

摘要

We present a new method for the classification of "noisy" documents, based on filtering contents with bigrams and named entities. The method is applied to call for tender documents, but we claim it would be useful for many other Web collections, which also contain non-topical contents. Different variations of the method are discussed. We obtain the best results by filtering out a window around the least relevant bigrams. We find a significant increase of the micro-F1 measure on our collection of call for tenders, as well as on the "4-Universities" collection. Another approach, to reject sentences based on the presence of some named entities, also shows a moderate increase. Finally, we try combining the two approaches, but do not get conclusive results so far.
机译:我们提出了一种基于“双字”和命名实体过滤内容的“嘈杂”文档分类的新方法。该方法适用于招标文件,但是我们声称它对于许多其他Web集合(也包含非主题内容)很有用。讨论了该方法的不同变化。我们通过过滤最不相关的二元组周围的窗口来获得最佳结果。我们发现,在我们的招标书集合以及“四所大学”集合中,微型F1衡量标准有了显着提高。另一种基于某些命名实体的存在拒绝句子的方法也显示出适度的增加。最后,我们尝试将两种方法结合起来,但是到目前为止还没有得出结论性的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号