An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients.ududWe propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets.ududOur temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks.ud
展开▼
机译:知识发现的一个重要目标是在数据中搜索可以帮助解释其底层结构的模式。要在实践中有用,发现的模式应该新颖(出乎意料)并且易于人类理解。在本文中,我们研究了挖掘模式(定义数据实例的子群体)的问题,该模式对于预测和解释特定的结果变量很重要。一个示例是确定与其他患者相比对某种治疗反应更好的患者群体的任务。 ud ud我们提出并提出了一种有效的方法来挖掘时间和时间(时间序列)数据的预测模式。我们的第一种方法依靠频繁的模式挖掘来探索搜索空间。它应用了一种新颖的评估技术来提取少量的频繁预测的模式,这些模式具有很高的预测性和低冗余性。我们在多种合成和公共数据集上展示了该方法的优势。 ud ud我们的时间模式挖掘方法适用于复杂的多元时间数据,例如电子健康记录,用于事件检测任务。它首先将时间序列转换为时间抽象的时间间隔序列,然后从与最新观察相关的模式开始,向后挖掘时间模式。我们在两个现实世界中的临床任务上展示了我们的时间模式挖掘方法的优势。 ud
展开▼