首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >A Heuristic Approach for Website Classification with Mixed Feature Extractors
【24h】

A Heuristic Approach for Website Classification with Mixed Feature Extractors

机译:具有混合特征提取器的网站分类的启发式方法

获取原文

摘要

We proposed an intelligent website classification schema based on deep neural networks using mixed featured extractors. With the guidance of supervised learning methods and iterative training, we use the gradient descent algorithm to model the website classification. This novel model is composed of four components, which includes a Website Encoder, a Text CNN Feature Extractor, a Bidirectional GRU Feature Extractor and a Fully Connected Classifier. It can extract multiple features at different granularities of a website. By using the concatenated mixed features taken from mixed feature extractors, our model can easily choose a suitable website class. We make extensive experiments on the realistic collected website dataset. The dataset is collected using domains extracted from DNS records of Telecom Operator. Compared the multiple widely used machine learning models and our novel model, results demonstrate the proposed classification schema outperforms the current models with the metrics precision, recall, F1, and accuracy. All of this can contribute to various web applications, such as malicious website detection, online advertising, etc.
机译:我们提出了一种基于使用混合特色提取器的深神经网络的智能网站分类模式。随着监督学习方法和迭代培训的指导,我们使用梯度下降算法来模拟网站分类。该新型模型由四个组件组成,包括网站编码器,文本CNN特征提取器,双向GRU特征提取器和完全连接的分类器。它可以在网站的不同粒度提取多个功能。通过使用从混合特征提取器取出的连接混合功能,我们的模型可以轻松选择合适的网站类。我们对现实收集的网站数据集进行了广泛的实验。使用从电信运算符的DNS记录中提取的域收集数据集。比较了多种广泛使用的机器学习模型和我们的小说模型,结果证明了所提出的分类模式优于当前模型,具有指标精度,召回,F1和准确性。所有这些都可以贡献各种Web应用程序,例如恶意网站检测,在线广告等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号