首页> 外文期刊>Expert systems with applications >Automatic generation of semantically enriched web pages by a text mining approach
【24h】

Automatic generation of semantically enriched web pages by a text mining approach

机译:通过文本挖掘方法自动生成语义丰富的网页

获取原文
获取原文并翻译 | 示例

摘要

Nowadays most of the Web pages contain little amount of structure and supporting information that can reveal their semantics or meanings. To enable automated processing of the Web pages, semantic infor-mation such as metadata and tags regarding to each page should be added to it. Several authoring tools have been developed to help users tackling this task. However, manual or semi-automatic authoring is implausible when we intend to annotate large amount of Web pages. In this work, we proposed a method to automatically generate some descriptive metadata and tags for a Web page. The idea is to apply the self-organizing map algorithm to cluster the Web pages and discover the relationships between these clusters. In the mean time, the themes of each cluster are also identified. We then use such relationships and themes to tag the Web pages and generate metadata for the Web pages. The result of experiments shows that our method may generate semantically relevant metadata and tags for the Web pages.
机译:如今,大多数Web页面都包含很少的结构和支持信息,这些信息可以揭示其语义或含义。为了能够自动处理Web页面,应该向其添加语义信息,例如与每个页面有关的元数据和标签。已经开发了多种创作工具来帮助用户完成此任务。但是,当我们打算注释大量Web页面时,手动或半自动创作是不可行的。在这项工作中,我们提出了一种为网页自动生成一些描述性元数据和标签的方法。这个想法是应用自组织映射算法对网页进行聚类并发现这些聚类之间的关系。同时,每个集群的主题也被确定。然后,我们使用这种关系和主题来标记Web页面并为Web页面生成元数据。实验结果表明,我们的方法可以为网页生成语义相关的元数据和标签。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号