首页> 外国专利> Architecture of a framework for information extraction from natural language documents

Architecture of a framework for information extraction from natural language documents

机译：从自然语言文档中提取信息的框架的架构

页面导航

摘要
著录项
相似文献

摘要

A framework for information extraction from natural language documents is application independent and provides a high degree of reusability. The framework integrates different Natural Language/Machine Learning techniques, such as parsing and classification. The architecture of the framework is integrated in an easy to use access layer. The framework performs general information extraction, classification/categorization of natural language documents, automated electronic data transmission (e.g., E-mail and facsimile) processing and routing, and plain parsing. Inside the framework, requests for information extraction are passed to the actual extractors. The framework can handle both pre- and post processing of the application data, control of the extractors, enrich the information extracted by the extractors. The framework can also suggest necessary actions the application should take on the data. To achieve the goal of easy integration and extension, the framework provides an integration (outside) application program interface (API) and an extractor (inside) API.

机译：从自然语言文档中提取信息的框架与应用程序无关，并且提供了高度的可重用性。该框架集成了不同的自然语言/机器学习技术，例如解析和分类。该框架的体系结构集成在易于使用的访问层中。该框架执行常规信息提取，自然语言文档的分类/分类，自动电子数据传输（例如，电子邮件和传真）处理和路由以及纯解析。在框架内部，信息提取请求被传递到实际的提取器。该框架可以处理应用程序数据的预处理和后处理，提取器的控制，丰富提取器提取的信息。该框架还可以建议应用程序应对数据采取的必要措施。为了实现轻松集成和扩展的目标，该框架提供了集成（外部）应用程序接口（API）和提取器（内部）API。

著录项

公开/公告号US6553385B2

专利类型
公开/公告日2003-04-22

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US19980145408
发明设计人 DAVID E. JOHNSON;THOMAS HAMPP-BAHNMUELLER;
展开▼

申请日1998-09-01
分类号G06F170/00;
国家 US
入库时间 2022-08-22 00:05:44

相似文献

专利
外文文献
中文文献