首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Recognition of Pornographic Web Pages by Classifying Texts and Images
【24h】

Recognition of Pornographic Web Pages by Classifying Texts and Images

机译:通过分类文本和图像识别色情网页

获取原文
获取原文并翻译 | 示例

摘要

With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
机译:随着万维网的迅速发展,人们越来越多地从信息共享中受益。但是,具有淫秽,有害或非法内容的网页可以轻松访问。识别此类不合适,令人反感或色情的网页非常重要。在本文中,描述了一种识别色情网页的新颖框架。 C4.5决策树用于根据内容表示将Web页面分为连续文本页面,离散文本页面和图像页面。这三类网页分别由连续文本分类器,离散文本分类器和融合图像分类器和离散文本分类器结果的算法处理。在连续文本分类器中,统计和语义特征用于识别色情文本。在离散文本分类器中,朴素的贝叶斯规则用于计算离散文本是色情内容的概率。在图像分类器中,提取对象基于轮廓的特征以识别色情图像。在文本和图像融合算法中,贝叶斯理论用于结合图像和文本的识别结果。实验结果表明,连续文本分类器优于传统的基于关键词统计的分类器,基于轮廓的图像分类器优于传统的基于皮肤区域的图像分类器,我们的融合算法获得的结果优于任何一个单独的分类器,并且我们的框架可以适应不同类别的网页。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号