Filtering Contents with Bigrams and Named Entities to Improve Text Classification

机译：使用双字母组和命名实体过滤内容以改善文本分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a new method for the classification of "noisy" documents, based on filtering contents with bigrams and named entities. The method is applied to call for tender documents, but we claim it would be useful for many other Web collections, which also contain non-topical contents. Different variations of the method are discussed. We obtain the best results by filtering out a window around the least relevant bigrams. We find a significant increase of the micro-F1 measure on our collection of call for tenders, as well as on the "4-Universities" collection. Another approach, to reject sentences based on the presence of some named entities, also shows a moderate increase. Finally, we try combining the two approaches, but do not get conclusive results so far.

机译：我们提出了一种基于“双字”和命名实体过滤内容的“嘈杂”文档分类的新方法。该方法适用于招标文件，但是我们声称它对于许多其他Web集合（也包含非主题内容）很有用。讨论了该方法的不同变化。我们通过过滤最不相关的二元组周围的窗口来获得最佳结果。我们发现，在我们的招标书集合以及“四所大学”集合中，微型F1衡量标准有了显着提高。另一种基于某些命名实体的存在拒绝句子的方法也显示出适度的增加。最后，我们尝试将两种方法结合起来，但是到目前为止还没有得出结论性的结果。

著录项

来源
《Asia Information Retrieval Symposium(AIRS 2005); 20051013-15; Jeju Island(KR)》|2005年|P.135-146|共12页
会议地点 Jeju Island(KR)
作者
Francois Paradis; Jian-Yun Nie;
展开▼
作者单位

Universite de Montreal, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类数据备份与恢复;
关键词
入库时间 2022-08-26 13:56:29

相似文献

外文文献
中文文献
专利

1. Named entity recognition and classification in biomedical text using classifier ensemble [J] . Saha Sriparna, Ekbal Asif, Sikdar Utpal Kumar International journal of data mining and bioinformatics . 2015,第4期

机译：使用分类器集成在生物医学文本中命名实体识别和分类
2. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1 [J] . ?eker G?khan Ak?n, Eryi?it Gül?en Semantic web . 2017,第5期

机译：扩展基于CRF的命名实体识别模型，用于土耳其良好的文本和用户生成的content1
3. Improving named entity recognition in noisy user-generated text with local distance neighbor feature [J] . Neurocomputing . 2020,第Mara21期

机译：使用本地距离邻居功能改善嘈杂的用户生成文本中的命名实体识别
4. Filtering Contents with Bigrams and Named Entities to Improve Text Classification [C] . Francois Paradis, Jian-Yun Nie Asia Information Retrieval Symposium . 2005

机译：过滤与Bigrams和命名实体的内容以改善文本分类
5. Named Entity Resolution for Historical Texts [D] . Holmes, Audrey. 2019

机译：为历史文本命名的实体分辨率
6. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition [O] . Wangjin Lee, Jinwook Choi 2019

机译：前体诱导的条件随机场：通过诱导连接单独的实体以改善临床命名实体的识别
7. Named Entity Recognition for Web Content Filtering ⋆ [O] . José María, Gómez Hidalgo, Francisco Carrero García, 2008

机译：Web内容过滤的命名实体识别
8. Term Association Analysis for Named Entity Filtering. [R] . Gross, O., Doucet, A., Toivonen, H. 2012

机译：命名实体过滤的术语关联分析。

Filtering Contents with Bigrams and Named Entities to Improve Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅