A sub-document ensemble learning (SEL) method was proposed to solve the problem of SMS spam filtering.The method used the SEL framework to break the online binary classification issue of short texts into several sub-issues,and made the final category prediction by a linear combination of several sub-results.Moreover,an effective weak classifier was implemented according to the string-frequency-index-based text classification algorithm.The experimental results showed that performances of previous text classification algorithms could be improved by the SEL framework,and the string-frequency-index-based weak classifier could achieve the state-of-the-art performance within the SEL framework.%针对垃圾短信过滤问题,提出了一种亚文档集成学习方法.该方法采用亚文档集成学习框架将短文本在线二值分类问题转化成若干个子分类问题,并通过线性组合多个子问题的分类结果得出最终的分类预测.利用基于串频索引的文本分类算法实现了一种有效的弱分类器.实验数据表明亚文档集成学习框架能够提高现有文本分类算法的效能,而在亚文档集成学习框架下,基于串频索引的弱分类器过滤效果最佳.
展开▼