首页> 外国专利> SYSTEMS AND METHODS FOR CONTENT EXTRACTION FROM A MARK-UP LANGUAGE TEXT ACCESSIBLE AT AN INTERNET DOMAIN

SYSTEMS AND METHODS FOR CONTENT EXTRACTION FROM A MARK-UP LANGUAGE TEXT ACCESSIBLE AT AN INTERNET DOMAIN

机译:从互联网域可访问的标记语言文本中提取内容的系统和方法

摘要

Systems and methods are presented for content extraction from markup language text. The content extraction process may parse markup language text into a hierarchical data model and then apply one or more filters. Output filters may be used to make the process more versatile. The operation of the content extraction process and the one or more filters may be controlled by one or more settings set by a user, or automatically by a classifier. The classifier may automatically enter settings by classifying markup language text and entering settings based on this classification. Automatic classification may be performed by clustering unclassified markup language texts with previously classified markup language texts.
机译:提出了用于从标记语言文本中提取内容的系统和方法。内容提取过程可以将标记语言文本解析为分层数据模型,然后应用一个或多个过滤器。输出过滤器可用于使过程更加通用。内容提取过程和一个或多个过滤器的操作可以由用户设置的一个或多个设置来控制,或者由分类器自动控制。分类器可以通过对标记语言文本进行分类并基于该分类来输入设置来自动输入设置。可以通过将未分类的标记语言文本与先前分类的标记语言文本聚类来执行自动分类。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号