首页> 外国专利> WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES

WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES

机译：通过页面结构和句子的联合理解提取网页实体

页面导航

摘要
著录项
相似文献

摘要

Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.

机译：描述了一种用于理解网页的实体，例如在网页上标记实体的技术。迭代和双向框架处理网页，该网页包括文本理解组件（例如，扩展的Semi-CRF模型），该文本理解组件向结构理解组件（例如，扩展的HCRF模型）提供文本分割功能。结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构（例如，标记的块）。文本理解组件又使用标记的块来进一步理解文本。该过程反复进行，直到满足相似性标准为止，此时可以标记实体。还描述了在网页中多次提及一组文本以帮助标记实体的方法。

著录项

公开/公告号US2011078554A1

专利类型
公开/公告日2011-03-31

原文格式PDF
申请/专利权人 ZAIQING NIE;YONG CAO;JI-RONG WEN;CHUNYU YANG;
展开▼

申请/专利号US20090569912
发明设计人 ZAIQING NIE;YONG CAO;JI-RONG WEN;CHUNYU YANG;
展开▼

申请日2009-09-30
分类号G06F17/21;
国家 US
入库时间 2022-08-21 18:10:44

相似文献

专利
外文文献
中文文献