首页> 外文会议>Asia Pacific Web and Web-Age Information Management >Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining
【24h】

Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining

机译:通过模糊顺序模式挖掘提取各种类型的信息网络内容

获取原文

摘要

In this paper, we present a web content extraction method to extract different types of informative web content for news web pages. A fuzzy sequential pattern mining method, namely FSP, is developed to gradually discover fuzzy sequential patterns for various types of informative web content. To avoid the situation that the usage of HTML tags may be changed with the development of web technology, fuzzy sequential patterns are mined using a stable feature, in particular, the number of tokens in each line of source code. We have conducted extensive experiments and good clustering properties for the discovered sequential patterns are observed. Experimental results demonstrate that the FSP method is effective compared with state-of-the-art content extraction methods. Besides main articles of web pages, it can also find other types interesting web content such as article recommendations and article titles effectively.
机译:在本文中,我们介绍了一个Web内容提取方法,用于提取新闻网页的不同类型的信息Web内容。模糊顺序模式挖掘方法,即FSP,用于逐步发现各种类型的信息Web内容的模糊顺序模式。为了避免使用Web技术的开发可以改变使用HTML标签的情况,使用稳定的特征,特别地,源代码中的令牌的数量进行模糊顺序模式。我们对发现的顺序模式进行了广泛的实验和良好的聚类性质。实验结果表明,与最先进的内容提取方法相比,FSP方法是有效的。除了主要的网页文章外,还可以找到其他类型的Web内容,如文章建议和文章标题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号