【24h】

Signature Word Extracting Research Based On Web Metadata

机译:基于Web元数据的签名词提取研究

获取原文

摘要

Signature word of the text extracting is a useful technique which can abstract Web page text, as well as it provides technical support for text classification, information extraction and other related tasks. This paper attempts to partition document into a hierarchical structure by parsing the semantic distance between each adjacent paragraph in the web page content. On the basis of the hierarchical structure we use the metadata and special tags of the HTML to design a weighting function by considering the factor of the frequency, length and location of the word. Finally, various location factors on the system's contribution are comparative analyzed.
机译:文本提取的签名单词是一种可以抽象网页文本的有用技术,以及它为文本分类,信息提取和其他相关任务提供技术支持。本文试图通过在网页内容中解析每个相邻段落之间的语义距离来将文档分区为分层结构。在分层结构的基础上,我们使用HTML的元数据和特殊标记通过考虑频率,长度和位置的因子来设计加权函数。最后,系统贡献的各种位置因素是对比分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号