首页> 外文会议>International Conference on Neural Information Processing >The Automatic Classification of Web Pages Based on Neural Network
【24h】

The Automatic Classification of Web Pages Based on Neural Network

机译:基于神经网络的网页自动分类

获取原文

摘要

The web pages classification is certainly important. A technique of extracting field information as common knowledge may be also needed. Compound word processing in keyword extraction from web pages is also one of important factors. In this method, the tour fields are systematically defined at first and the information related to the field is extracted. A new method of extracting feature was considered, which can incorporate three items of information: text, HTML tags and hyperlinks properly. Accordingly, this paper presents a neural network algorithm (Self-organizing feature map) to study on automatic classification of web pages. The proposed approach is based on a new set of features combined with a self-organized neural network classifier. The set of features corresponds to the contents, is selected by using a statistical reduction procedure, and provides text keywords, hyperlink and HTML tags information. The final set of features is then utilized as input vector into a proper neural network to achieve the classification goal. Web pages are classified as different classes. A series of experiments were conducted to evaluate performance of our approach. The results have shown it is quite promising.
机译:网页分类肯定是重要的。还需要一种提取现场信息作为常识的技术。来自网页的关键字提取中的复合词处理也是重要因素之一。在该方法中,首先系统地定义巡回场所,并提取与该字段相关的信息。考虑了一种新的提取功能方法,可以包含三项信息:文本,HTML标记和超链接。因此,本文提出了一种神经网络算法(自组织特征图),用于研究网页的自动分类。所提出的方法基于一组新的特征,与自组织神经网络分类器相结合。通过使用统计减少过程选择的一组特征对应于内容,并提供文本关键字,超链接和HTML标记信息。然后将最终特征集用作正确的神经网络中的输入向量以实现分类目标。网页分类为不同的类。进行了一系列实验以评估我们的方法。结果表明它是非常有前途的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号