首页> 外文学位 >Mining sequences in distributed sensors data for energy production.
【24h】

Mining sequences in distributed sensors data for energy production.

机译:分布式传感器数据中的挖掘序列用于产生能量。

获取原文
获取原文并翻译 | 示例

摘要

Brief overview of the problem. The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations to enforce companies to operate within the limits of the law resulting in environmentally safe operation. Specifically, power companies operate electric generating facilities under guidelines drawn-up and enforced by the EPA. Acid rain and other harmful factors require that electric generating facilities report hourly emissions recorded via a Supervisory Control and Data Acquisition (SCADA) system. SCADA is a control and reporting system that is present in all power plants consisting of sensors and control mechanisms that monitor all equipment within the plants. The data recorded by a SCADA system is collected by the EPA and allows them to enforce proper plant operation relating to emissions. This data includes a lot of generating unit and power plant specific details, including hourly generation. This hourly generation (termed grossunitload by the EPA) is the actual hourly average output of the generator on a per unit basis. The questions to be answered are do any of these units operate in tandem and do any of the units start, stop, or change operation as a result of another's change in generation? These types of questions will be answered for the years of April 2002 through April 2003 for facilities that operate pipeline natural-gas-fired generating units.;Purpose of research. The research conducted has dual uses if fruitful. First, the use of a local modeling between generating units would be highly profitable among energy traders. Betting that a plant will operate a unit based on another's current characteristics would be sensationally profitable to energy traders. This profitability is variable due to fuel type. For instance, if the price of coal is extremely high due to shortages, the value of knowing a semi-operating characteristic of two generating units is highly valuable. Second, this known characteristic can also be used in regulation and operational modeling. The second use is of great importance to government agencies. If regulatory committees can be aware of past (or current) similarities between power producers, they may be able to avoid a power struggle in a region caused by greedy traders or companies. Not considering profitable motives, the Department of Energy may use something similar to generate a model of power grid generation availability based on previous data for reliability purposes.;Type of problem. The problem tackled within this Master's thesis is of multiple time series pattern recognition. This field is expansive and well studied, therefore the research performed will benefit from previously known techniques. The author has chosen to experiment with conventional techniques such as correlation, principal component analysis, and k-means clustering for feature and eventually pattern extraction. For the primary analysis performed, the author chose to use a conventional sequence discovery algorithm. The sequence discovery algorithm has no prior knowledge of space limitations, therefore it searches over the entire space resulting in an expense but complete process. Prior to sequence discovery the author applies a uniform coding schema to the raw data, which is an adaption of a coding schema presented by Keogh. This coding and discovery process is deemed USD, or Uniform Sequence Discovery. The data is highly dimensional along with being extremely dynamic and sporadic with regards to magnitude. The energy market that demands power generation is profit and somewhat reliability driven. The obvious factors are more reliability based, for instance to keep system frequency at 60Hz, units may operate in an idle state resulting in a constant or very low value for a period of time (idle time). Also to avoid large frequency swings on the power grid, companies are required to be able to ramp-up a generator quickly using its spinning reserve.;Brief review of results. The results of this research identify common characteristics between generating units for the data tested. These characteristics are extremely obvious and useful on a generating unit level. Even though there were characteristics discovered, the data tested were very sparse. After looking at the testing dataset, the author feels that the distribution of data will follow a similar pattern regardless of the quarter examined. Regardless of the distribution, it is essential to process new data once released. If newer data are tested, as it should be for each new dataset released, the author is confident that the discovery of new characteristics is foreseeable. These updated characteristics along with historical patterns will allow traders to foresee high confidence electricity generation.
机译:问题的简要概述。美国政府资助的环境保护机构(EPA)为美国的排放监测提供立法和司法权力。该机构根据自制法规制定法律,以强制公司在法律范围内运营,从而实现对环境安全的运营。具体来说,电力公司根据EPA制定并执行的准则来运营发电设施。酸雨和其他有害因素要求发电设备报告通过监控和数据采集(SCADA)系统记录的每小时排放量。 SCADA是一个控制和报告系统,存在于所有发电厂中,由传感器和控制机构组成,用于监视电厂中的所有设备。由SCADA系统记录的数据由EPA收集,并允许他们强制执行与排放有关的适当工厂操作。该数据包括许多发电设备和发电厂的详细信息,包括每小时发电量。该每小时发电量(由EPA称为总单位负荷)是发电机每单位实际小时平均实际产量。需要回答的问题是,这些单元中的任何一个是否是串联运行的,并且由于另一个人的发电量改变,这些单元中的任何一个是否启动,停止或更改了操作?这些类型的问题将在2002年4月至2003年4月期间回答,这些设施用于运行管道天然气发电机组的设施。研究目的。进行的研究如果有成果,则具有双重用途。首先,在发电机组之间使用局部模型将在能源交易商中获得巨大收益。押注一家工厂将根据另一个工厂的当前特性来运行一个设备,这对能源交易商而言将是可观的获利。该获利能力因燃料类型而异。例如,如果由于短缺而导致煤炭价格极高,那么了解两个发电机组的半运行特性的价值就非常有价值。其次,该已知特征也可以用于调节和操作模型。第二种用途对政府机构非常重要。如果监管委员会可以了解电力生产商之间的过去(或当前)相似之处,那么他们就可以避免贪婪的贸易商或公司在该地区造成的权力斗争。不考虑有利可图的动机,出于可靠性目的,能源部可能会使用类似的方法基于以前的数据来生成电网发电可用性模型。本硕士论文解决的问题是多个时间序列模式识别。这个领域是广阔的,并且经过了充分的研究,因此进行的研究将受益于先前已知的技术。作者选择使用传统技术进行实验,例如相关性,主成分分析和k-means聚类,以进行特征提取并最终进行模式提取。对于执行的主要分析,作者选择使用传统的序列发现算法。序列发现算法没有空间限制的先验知识,因此它在整个空间中进行搜索,从而导致费用高昂但过程完整。在发现序列之前,作者将统一的编码方案应用于原始数据,这是Keogh提出的编码方案的改编。该编码和发现过程被视为USD,或统一序列发现。数据具有高度的维度,并且在大小方面极为动态和零星。要求发电的能源市场是利润和某种程度上由可靠性驱动的。显而易见的因素是基于可靠性的,例如将系统频率保持在60Hz,设备可能会在空闲状态下工作,从而在一段时间(空闲时间)内保持恒定或非常低的值。另外,为避免电网出现较大的频率波动,公司还必须能够利用发电机的旋转储备来快速增加发电机的运行速度。这项研究的结果为测试数据确定了发电单元之间的共同特征。这些特性在发电机组级别上非常明显且有用。即使发现了特征,测试的数据也很少。在查看了测试数据集之后,作者认为无论所检查的季度如何,数据的分布都将遵循类似的模式。不管分发如何,发布后处理新数据都是至关重要的。如果测试了较新的数据(对于每个已发布的新数据集都应如此),那么作者相信可以预见到新特性的发现。这些更新的特征以及历史模式将使贸易商能够预见高可信度的发电量。

著录项

  • 作者

    Gant, John Damon.;

  • 作者单位

    University of Louisville.;

  • 授予单位 University of Louisville.;
  • 学科 Statistics.;Engineering Electronics and Electrical.;Computer Science.
  • 学位 M.Eng.
  • 年度 2006
  • 页码 153 p.
  • 总页数 153
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号