首页> 外国专利> Automated Textual analysis technology for unstructured big data mining in the form of compound documents

Automated Textual analysis technology for unstructured big data mining in the form of compound documents

机译:以复合文件形式的非结构化大数据挖掘自动化文本分析技术

摘要

While big data has rapidly emerged as an interest in information technology around the world, interest in what kind of value will be created through the big data collected so far by public institutions and private companies is increasing. Therefore, the present invention, big data management and system for the automatic text analysis method for unstructured big data mining in the form of a compound document, constitutes the development stage as follows. First, the development of a compound document collector module that collects, classifies, extracts, and stores files in the form of complex documents provided in various formats by public institutions/private companies. Second, the collected big data is stored and managed through the Hadoop Distributed File System (HDFS), and a specialized field natural language processing (pre-processing) module is developed for refining specialized field data that cannot be refined by general natural language processing. Third, it is a module development that analyzes, classifies, and groups unstructured data by subject using real-time intelligent data mining technology from preprocessed data, and performs data anomaly detection and automatic purification.
机译:虽然大数据迅速出现为对世界各地的信息技术的兴趣,但通过公共机构和私营公司到目前为止,通过大幅度收集的大数据创造了什么样的价值的兴趣。因此,本发明,用于以复合文件形式的非结构化大数据挖掘的自动文本分析方法的大数据管理和系统构成如下的发展阶段。首先,在公共机构/私营公司/私营公司中以各种格式提供的复杂文档的形式开发收集,分类,提取和存储文件的复合文档收集器模块。其次,通过Hadoop分布式文件系统(HDF)存储和管理收集的大数据,并且开发了专门的现场自然语言处理(预处理)模块,用于精制不能通过一般自然语言处理来改进的专业字段数据。第三,它是通过使用预处理数据的实时智能数据挖掘技术来分析,分类,分类和组非结构化数据的模块开发,并执行数据异常检测和自动净化。

著录项

  • 公开/公告号KR20210085306A

    专利类型

  • 公开/公告日2021-07-08

    原文格式PDF

  • 申请/专利权人 케이웨어 (주);

    申请/专利号KR1020190178207

  • 发明设计人 남준;

    申请日2019-12-30

  • 分类号G06F16/35;G06F16/31;G06F16/36;G06F40/20;

  • 国家 KR

  • 入库时间 2022-08-24 20:06:18

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号