首页>
外国专利>
WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES
WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES
展开▼
机译:通过页面结构和句子的联合理解提取网页实体
展开▼
页面导航
摘要
著录项
相似文献
摘要
Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
展开▼