首页> 外国专利> WEBPAGE TEXT EXTRACTION METHOD AND DEVICE, AND WEBPAGE ADVERTISEMENT HANDLING METHOD AND DEVICE

WEBPAGE TEXT EXTRACTION METHOD AND DEVICE, AND WEBPAGE ADVERTISEMENT HANDLING METHOD AND DEVICE

机译:网页文本提取方法和装置,以及网页广告处理方法和装置

摘要

Disclosed are a webpage text extraction method and device, and webpage advertisement handling method and device, the webpage text extraction method comprising: reading webpage data, determining interference data contained in the webpage data, and replacing the interference data with null characters; recording the line number of each line on a webpage and the number of words in the corresponding line; determining the webpage text by utilizing the line number of each line and the word total of the corresponding line; and extracting the webpage text. Compared with the prior art, the present invention does not depend on a browser environment and page structure, and has good expandability.
机译:公开了一种网页文本提取方法和装置,以及网页广告处理方法和装置,该网页文本提取方法包括:读取网页数据,确定包含在该网页数据中的干扰数据,并将干扰数据替换为空字符;在网页上记录每行的行号和相应行中的单词数;利用每行的行号和对应行的单词总数确定网页文本;并提取网页文本。与现有技术相比,本发明不依赖于浏览器环境和页面结构,具有良好的扩展性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号