首页> 外文会议>International conference on management of data >The SystemT IDE: An Integrated Development Environment for Information Extraction Rules
【24h】

The SystemT IDE: An Integrated Development Environment for Information Extraction Rules

机译:SystemT IDE:信息提取规则的集成开发环境

获取原文
获取外文期刊封面目录资料

摘要

Information Extraction (IE) - the problem of extracting structured information from unstructured text lias become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability." developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT. a state-of-the-art rule-based IE system from IBM Research that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples.
机译:信息提取(IE)-从非结构化文本别名中提取结构化信息的问题已成为许多企业应用程序(例如语义搜索,业务分析和法规遵从性)的关键推动力。尽管基于规则的IE系统由于其众所周知的“可解释性”而在实践中得到了广泛使用。众所周知,开发高质量的信息提取规则是一项劳动强度大且耗时的迭代过程。我们的演示展示了SystemT IDE,这是SystemT的集成开发环境。 IBM Research提供的基于规则的最新IE系统,该系统已成功嵌入到多个IBM企业产品中。 SystemT IDE通过从数据管理到机器学习的复杂技术,促进了高质量IE规则的开发,测试和分析。我们展示了如何使用SystemT IDE提供的一套工具来构建高质量的IE注释器,包括计算数据出处,学习正则表达式和字典等基本功能以及根据带标签的示例自动完善规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号