首页>
外国专利>
System and method for automatically detecting and extracting semantically significant text from a HTML document associated with a plurality of HTML documents
System and method for automatically detecting and extracting semantically significant text from a HTML document associated with a plurality of HTML documents
展开▼
机译:从与多个HTML文档关联的HTML文档中自动检测和提取语义上有意义的文本的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system and method for automatically detecting and extracting semantically significant text from a HTML document associated with a plurality of HTML documents is disclosed. The method may include receiving a HTML document, parsing the HTML document into a parse tree, segmenting the parse tree into one or more segments of one or more unique paths, processing the one or more segments based at least the HTML document, and extracting one or more processed segments from the at least the HTML document based on a predetermined number.
展开▼