Abstract Data service generation framework from heterogeneous printed forms using semantic link discovery
首页> 外文期刊>Future generation computer systems >Data service generation framework from heterogeneous printed forms using semantic link discovery
【24h】

Data service generation framework from heterogeneous printed forms using semantic link discovery

机译:使用语义链接发现从异构打印表单生成数据服务框架

获取原文
获取原文并翻译 | 示例
       

摘要

AbstractPrinted forms contain rich information in business process and daily life. However, tremendous heterogeneous printed forms containing same categories of information are difficult to manage and share, which lead to massive data in printed forms remaining waste. To automatically integrate and share these data remarkably improves the efficiency of enterprises, the key problem is how to extract heterogeneous data in printed forms and integrate them for quick use. To solve this issue, we propose a framework that discovers semantic links in printed forms and generates data services for easy data management and rapid data sharing in the enterprise systems. First, a multiple-OCR-based form recognition approach is proposed to make forms computer-readable. Next, forms are modeled into semi-structured data using structure-based semantic link discovery and refining with massive data. Then, a linked data model is built by table matching to align data. Finally, data services are generated based on the linked data model. A series of experiments on printed resumes are conducted, and the results illustrate our framework performs well in recognition rate, link discovery accuracy, data compression ratio and data resource accuracy. A prototype system is presented to illustrate the feasibility of the proposed framework.HighlightsA complete and feasible framework from printed forms to data service is proposed.An automatic form data extraction and structuration approach is presented.A usable prototype system integrating heterogeneous printed resumes is implemented.
机译: 摘要 打印的表格包含有关业务流程和日常生活的丰富信息。但是,包含相同类别信息的大量异构打印表单难以管理和共享,这导致打印表单中的海量数据仍然浪费。自动集成和共享这些数据显着提高了企业效率,关键问题是如何提取打印形式的异构数据并将其集成以便快速使用。为解决此问题,我们提出了一个框架,该框架可发现印刷形式的语义链接并生成数据服务,以简化企业系统中的数据管理和快速数据共享。首先,提出了一种基于多OCR的表单识别方法,以使表单计算机可读。接下来,使用基于结构的语义链接发现和海量数据精炼,将表单建模为半结构化数据。然后,通过表匹配来构建链接数据模型以对齐数据。最后,基于链接的数据模型生成数据服务。在打印的简历上进行了一系列实验,结果表明我们的框架在识别率,链接发现准确性,数据压缩率和数据资源准确性方面表现良好。提出了一个原型系统来说明所提出框架的可行性。 突出显示 < ce:label>• 提出了一个完整可行的框架,从打印表格到数据服务。 提出了一种自动表单数据提取和结构化方法。 实现了一个集成了多种印刷简历的可用原型系统

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号