首页> 外文会议>International Workshop on Active Mining >Mining Chemical Compound Structure Data Using Inductive Logic Programming
【24h】

Mining Chemical Compound Structure Data Using Inductive Logic Programming

机译:采用电感逻辑编程的化学复合结构数据

获取原文

摘要

Discovering knowledge from chemical compound structure data is a challenge task in KDD. It aims to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Since each compound composes of several parts with complicated relations among them, traditional mining algorithms cannot handle this kind of data efficiently. In this research, we apply Inductive Logic Programming (ILP) for classifying chemical compounds. ILP provides comprehensibility to learning results and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. We introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. The approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. Chemical compound data is multiple-part data. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for chemical compound structure by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.
机译:从化合物的结构数据发现知识是在KDD一个挑战的任务。它的目的是生成描述从自身结构的活动或化合物的特性假说。由于几部分与它们之间关系复杂每种化合物组成,传统的挖掘算法不能有效地处理这种类型的数据。在这项研究中,我们申请的化合物进行分类归纳逻辑程序设计(ILP)。 ILP提供可理解性学习成果和能力来处理由他们的关系更复杂的数据。然而,对于学习一阶理论的瓶颈是导致由现有的学习效率低下的表现方法相比,在命题的办法巨大假设搜索空间。我们引入了能够处理更有效的一种叫做多部分数据的数据,即数据的一个实例由几部分组成,以及零件之间的关系的改进ILP方法。该方法尝试通过使用其一部分,其类似于在复杂的关系实例之间找到共同子结构包括个人和关系特性找到假说描述类每个训练样例的。化学化合物数据是多部分数据。每个化合物由原子作为部件,以及各种键的作为原子间关系的。致突变性硝基芳烃和多巴胺拮抗剂化合物:然后,我们进行实验的两个真实世界的数据集应用算法的化合物结构。实验结果进行了比较,以前的方法,以表明该方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号