首页>
外国专利>
Automatic information extraction method from a web text using mDTD rule
Automatic information extraction method from a web text using mDTD rule
展开▼
机译:使用mDTD规则从Web文本自动提取信息的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
PURPOSE: A method for automatically extracting the information of a web document using an mDTD(modified Document Type Definition) grammar rule is provided to conveniently and efficiently extract many information from the vast information of a domain by using the mDTD rule through the mechanical repetition learning. CONSTITUTION: The method for the mechanical learning comprises the steps of collecting the web document from the domain(S1), transforming the web document into a text object(S2), extracting a sample data from the text object according to a previously written seed mDTD rule(S3), attaching a format element tag to the sample data(S4), and generating the proper mDTD rule by using the tagged sample data(S5). The method for the automatic extraction comprises the steps of collecting the web document from the domain(S11), transforming the web document into the text object(S12), attaching the format element tag to the text object(S13), extracting a target by judging which mDTD rule among the mDTD rules generated by the mechanical learning process is suitable for the tagged text object(S14), and storing the extracted target in a domain database(S15).
展开▼