首页> 外国专利> Automatic information extraction method from a web text using mDTD rule

Automatic information extraction method from a web text using mDTD rule

机译：使用mDTD规则从Web文本自动提取信息的方法

页面导航

摘要
著录项
相似文献

摘要

PURPOSE: A method for automatically extracting the information of a web document using an mDTD(modified Document Type Definition) grammar rule is provided to conveniently and efficiently extract many information from the vast information of a domain by using the mDTD rule through the mechanical repetition learning. CONSTITUTION: The method for the mechanical learning comprises the steps of collecting the web document from the domain(S1), transforming the web document into a text object(S2), extracting a sample data from the text object according to a previously written seed mDTD rule(S3), attaching a format element tag to the sample data(S4), and generating the proper mDTD rule by using the tagged sample data(S5). The method for the automatic extraction comprises the steps of collecting the web document from the domain(S11), transforming the web document into the text object(S12), attaching the format element tag to the text object(S13), extracting a target by judging which mDTD rule among the mDTD rules generated by the mechanical learning process is suitable for the tagged text object(S14), and storing the extracted target in a domain database(S15).

机译：目的：提供一种使用mDTD（修改的文档类型定义）语法规则自动提取Web文档信息的方法，以通过机械重复学习使用mDTD规则方便，有效地从域的大量信息中提取许多信息。。组成：用于机械学习的方法包括以下步骤：从域中收集Web文档（S1），将Web文档转换为文本对象（S2），根据先前编写的种子mDTD从文本对象中提取样本数据规则（S3），将格式元素标签附加到样本数据（S4），并通过使用标记的样本数据来生成适当的mDTD规则（S5）。用于自动提取的方法包括以下步骤：从域中收集Web文档（S11），将Web文档转换为文本对象（S12），将格式元素标签附加到文本对象（S13），通过以下方法提取目标判断由机械学习过程生成的mDTD规则中哪个mDTD规则适合于标记文本对象（S14），并将提取的目标存储在域数据库中（S15）。

著录项

公开/公告号KR20020084944A

专利类型
公开/公告日2002-11-16

原文格式PDF
申请/专利权人 KIM DONG SEOK;LEE GEUN BAE;SEO JUNG YUN;
展开▼

申请/专利号KR20010024082
发明设计人 KIM DONG SEOK;LEE GEUN BAE;SEO JUNG YUN;
展开▼

申请日2001-05-03
分类号G06F17/21;
国家 KR
入库时间 2022-08-21 23:48:40

相似文献

专利
外文文献
中文文献