While big data has rapidly emerged as an interest in information technology around the world, interest in what kind of value will be created through the big data collected so far by public institutions and private companies is increasing. Therefore, the present invention, big data management and system for the automatic text analysis method for unstructured big data mining in the form of a compound document, constitutes the development stage as follows. First, the development of a compound document collector module that collects, classifies, extracts, and stores files in the form of complex documents provided in various formats by public institutions/private companies. Second, the collected big data is stored and managed through the Hadoop Distributed File System (HDFS), and a specialized field natural language processing (pre-processing) module is developed for refining specialized field data that cannot be refined by general natural language processing. Third, it is a module development that analyzes, classifies, and groups unstructured data by subject using real-time intelligent data mining technology from preprocessed data, and performs data anomaly detection and automatic purification.
展开▼