Improving imbalanced scientific text classification using sampling strategies and dictionaries

L. Borrajo; R. Romero; E. L. Iglesias; C. M. Redondo Marey

首页> 外文期刊>Journal of Integrative Bioinformatics >Improving imbalanced scientific text classification using sampling strategies and dictionaries

【24h】

Improving imbalanced scientific text classification using sampling strategies and dictionaries

机译：使用采样策略和词典改进不平衡的科学文本分类

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Summary Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those ob- tained by other authors using the TREC Genomics 2005 public corpus.

机译：发明内容许多真实应用具有不平衡的类分布问题，其中其中一个类由与其他类相比非常少量的情况表示。受影响的系统之一是与科学文档的恢复和分类有关的系统。超采样和分支等采样策略在解决类别不平衡问题时很受欢迎。在这项工作中，我们在应用于搜索PubMed Scientific数据库时，我们对三种类型的分类器（KNN，SVM和Naive-Bayes）进行影响。本文的另一个目的是研究在生物医学文本的分类中使用字典。使用提到的分类器和采样策略，用三个不同的词典（BioCropive，NLPBA和命名蛋白质的ad-hoc子集）进行实验。使用来自子采样平衡技术的NLPBA和蛋白质词典和SVM分类器获得最佳结果。将这些结果与其他作者使用Trec Genomics 2005 Public Corpus进行了比较。

著录项

来源
《Journal of Integrative Bioinformatics》 |2011年第3期|共15页
作者
L. Borrajo; R. Romero; E. L. Iglesias; C. M. Redondo Marey;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improving imbalanced scientific text classification using sampling strategies and dictionaries [J] . Lourdes Borrajo, Rubén Romero, Eva Lorenzo Iglesias, Journal of Integrative Bioinformatics . 2011,第3期

机译：使用采样策略和词典改善不平衡的科学文本分类
2. Sample cutting method for imbalanced text sentiment classification based on BRC [J] . Suge Wang, Deyu Li, Lidong Zhao, Knowledge-Based Systems . 2013,第JANa期

机译：基于BRC的不平衡文本情感分类的样本切割方法
3. On strategies for imbalanced text classification using SVM: A comparative study [J] . Aixin Sun, Ee-Peng Lim, Ying Liu Decision support systems . 2009,第1期

机译：基于SVM的不平衡文本分类策略的比较研究
4. A bi-directional sampling based on K-means method for imbalance text classification [C] . Jia Song, Xianglin Huang, Sijun Qin, IEEE/ACIS International Conference on Computer and Information Science . 2016

机译：基于K-means方法的双向采样不平衡文本分类
5. Alleviating class imbalance using data sampling: Examining the effects on classification algorithms. [D] . Napolitano, Amri E. 2006

机译：使用数据采样缓解类不平衡：检查对分类算法的影响。
6. An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data [O] . Kung-Jeng Wang, Bunjira Makond, Kung-Min Wang 2013

机译：通过使用采样和特征选择技术解决不平衡的患者分类数据提高乳腺癌的生存率
7. Improving imbalanced scientific text classification using sampling strategies and dictionaries [O] . Borrajo Lourdes, Romero Rubén, Lorenzo Iglesias Eva, 2011

机译：使用采样策略和词典改善不平衡的科学文本分类

Improving imbalanced scientific text classification using sampling strategies and dictionaries

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅