首页> 外文会议>International conference on management of data >The SystemT IDE: An Integrated Development Environment for Information Extraction Rules
【24h】

The SystemT IDE: An Integrated Development Environment for Information Extraction Rules

机译:Systemt IDE:信息提取规则的集成开发环境

获取原文

摘要

Information Extraction (IE) - the problem of extracting structured information from unstructured text lias become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability." developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT. a state-of-the-art rule-based IE system from IBM Research that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples.
机译:信息提取(即) - 从非结构化文本LIA中提取结构化信息的问题成为许多企业应用程序的关键推动因素,如语义搜索,业务分析和法规遵从性。虽然基于规则的IE系统由于其众所周知的“解释性”而被广泛使用。已知高质量的信息提取规则是劳动密集型和耗时的迭代过程。我们的演示展示了Systemt IDE,Systemt的集成开发环境。来自IBM Research的基于最先进的规则IE系统,该研究已成功嵌入多个IBM企业产品中。 Systemt IDE通过复杂的技术促进高质量IE规则的开发,测试和分析,从数据管理到机器学习。我们展示了如何使用SystemT IDE提供的工具套件构建高质量IE注释器,包括计算数据出处,学习基本功能,如正则表达式和词典,以及基于标记示例的自动炼制规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号