This paper proposes a framework of change data capture and data extraction, which captures changed data based on the log analysis and processes the captured data further to improve the quality of data. Then processed data are pushed to a data queue and the system processes the data queue using priority-based scheduling algorithm. Ultimately processed data are loaded to real-time data warehouse to support decision analysis. After analysis of a test case, this method can capture all changed data coming from the source data in time without changing the structure of the source system, and has a little impact on system performance to the source system. In addition, the real-time scheduling algorithm can effectively improve the data quality and data freshness of the real-time data warehouse to give a better data support for business's routine tactical decision.
展开▼