首页> 外文会议>Offshore technology conference >Real-Time Cleaning of Time-Series Data for a Floating System Digital Twin
【24h】

Real-Time Cleaning of Time-Series Data for a Floating System Digital Twin

机译:用于浮动系统数字双胞胎的时间序列数据的实时清洁

获取原文

摘要

Using accurate and high quality data is critical for any application relying heavily on the data, be it machine learning, artificial intelligence, or digital twins. Bad quality and erroneous data can result in inaccurate predictions even if the model is otherwise robust. Ensuring data quality is more critical in realtime applications where there is no human in the loop to perform sense checks on data or results. A realtime digital twin implementation for a floating system uses time-series data from numerous measurements such as wind, waves, GPS, vessel motions, mooring tensions, draft, etc. Statistics computed from the data are used in the digital twin. An extensive data checking and cleaning routine was written that performs data quality checks and corrections on the time series data before statistics are computed. Various types of errors that typically occur in a time series include noise, flat-lined data, clipped data, outliers, and discontinuities. Statistical procedures were developed to check the raw time-series for all these errors. The procedures are generic and robust so they can be used for different types of data. Some data types are slow varying (e.g., GPS) while the others are fast varying random processes. A measurement classified as an error in one type of data is not necessarily an error in the other data type. For example, GPS data can be discontinuous by nature but a discontinuity in the wave data indicates an error. Likewise, checking for white noise in mooring tension data is not that meaningful. We developed parametric data procedures so that the same routine can handle different types of data and their errors. Outlier removal routines use the standard deviation of the time-series which itself could be biased from errors. Therefore, a method to compute unbiased statistics from the raw data is developed and implemented for robust outlier removal. Extensive testing on years of measured data and on hundreds of data channels was performed to ensure that data cleaning procedures function as intended. Statistics (mean, standard deviations, maximum, and minimum) were computed from both the raw and cleaned data. Comparison showed significant differences in raw and cleaned statistics, with the latter obviously being more accurate. Data cleaning, while not sounding as high tech as other analytics algorithms, is a critical foundation of any data science application. Using cleaned time-series data and corresponding statistics ensure that a data analytics model provides actionable results. Clean data and statistics help achieve the intended purpose of the digital twin, which is to inform operators of the health/condition of the asset and flag any anomalous events.
机译:使用准确性和高质量的数据对于依赖于数据的任何应用程序至关重要,是IT机器学习,人工智能或数字双胞胎。即使模型是强大的,质量和错误数据也可能导致预测不准确。确保数据质量在RealTime应用程序中更为关键,其中循环中没有人类以执行有关数据或结果的感觉检查。浮动系统的实时数字双胞胎实现使用来自众多测量的时序数据,例如风,波,GPS,血管动作,系泊张力,草稿等。从数据中使用的统计数据用于数字双胞胎。写入了广泛的数据检查和清洁程序,在计算计算统计数据之前对时间序列数据执行数据质量检查和校正。通常在时间序列中发生的各种类型的错误包括噪声,衬里数据,剪辑数据,异常值和不连续性。开发统计程序以检查所有这些错误的原始时间序列。该过程是通用和强大的,因此它们可用于不同类型的数据。某些数据类型慢速(例如,GPS),而其他数据类型是快速变化随机过程。在一种类型数据中归类为错误的测量不一定是其他数据类型中的错误。例如,GPS数据本质上可以是不连续的,但是波数据中的不连续性表示错误。同样,检查系泊张力数据中的白噪声并不是那么有意义。我们开发了参数数据过程,以便相同的例程可以处理不同类型的数据及其错误。异常删除例程使用时间系列的标准偏差,它本身可以从错误中偏置。因此,开发并实现了从原始数据中计算非偏见统计的方法,以实现鲁棒的异常删除。执行多年测量数据和数百个数据通道的广泛测试,以确保数据清洁程序按预期函数。从原始和清洁数据计算统计(平均值,标准偏差,最大值和最小值)。比较显示出原始和清洁统计数据的显着差异,后者明显更准确。数据清洁,虽然没有听起来作为其他分析算法的高科技,是任何数据科学应用程序的关键基础。使用清洁的时间序列数据和相应的统计数据确保数据分析模型提供可操作的结果。清洁数据和统计数据有助于实现数字双胞胎的预期目的,这是通知运营商的资产健康/条件并标记任何异常事件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号