首页> 外文期刊>Journal of chemical information and modeling >Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis
【24h】

Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis

机译:化学信息学中最近引入的知识发现算法的观点:粗糙集理论,关联规则挖掘,新兴模式和形式概念分析

获取原文
获取原文并翻译 | 示例
           

摘要

Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For.example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development, mean that there is the potential to apply these techniques to larger data sets and thus to different problems in the future.
机译:数据库中的知识发现(KDD)是指使用机器学习,模式识别,统计和其他领域中的方法从大量数据集中提取知识,而这些知识并未明确地用作数据库结构的一部分。在本文中,我们描述了四种现代数据挖掘技术:粗糙集理论(RST),关联规则挖掘(ARM),新兴模式挖掘(EP)和形式概念分析(FCA),并尝试给出了详尽的清单。的化学信息学应用程序。这些方法的主要优势之一是它们的描述能力。当用于导出规则时,例如在结构活动关系中,规则具有明确的物理含义。这项审查表明,这些方法之间存在密切的关系。通常,在表述所研究问题的方式上存在明显差异,这可能导致自然采用一种或其他方法。例如,结构警报的概念是一种存在于有毒而无毒化合物中不存在的结构,从而自然形成了“新兴模式”搜索。尽管方法之间有相似之处,但每种方法都有其优势。 RST对于处理不确定和嘈杂的数据很有用。迄今为止,它的主要化学信息学应用一直是特征提取和特征缩减,后者通常作为对另一种数据挖掘方法(例如支持向量机(SVM))的输入。 ARM主要用于频繁的子图挖掘。 EP和FCA都已用于挖掘结构和非结构模式,以对活性和非活性分子进行分类。自从它们在1980年代和1990年代问世以来,RST,ARM,EP和FCA已经发现了广泛的应用,在Web of Science中有成千上万的引用,但是化学信息学界对其的采用相对较慢。计算机能力和算法开发方面的进步意味着,有可能将这些技术应用于更大的数据集,从而在将来解决各种问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号