首页> 外文会议> >Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design

【24h】

Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design

机译：从药物设计中的高维数据库中学习的整体特征子集选择方法的实证评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140,000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.

机译：发现新药不仅是制药领域而且是分子生物学，化学和医学等多个领域的最重要目标之一。通过计算理解给定化合物与其药物活性之间关系的重要性已得到显着体现。在与化合物的药物活性有关的数据集中，每一行对应于一种化合物，而列则是该化合物的描述符和表示该化合物的药物活性的标签。最近，描述符的大小变得越来越大，以获得更详细的信息。一组给定化合物的信息。实际上，某些药物数据集的列数（属性或特征）达到数十万或一百万。本文的目的是通过将集成特征子集选择策略应用于药物设计过程中实际使用的此类高维数据集，以实证评估其性能。我们检查了三种集成方法（包括基于查询学习的方法）与最新特征子集选择方法之一的性能。对包含大约140,000个要素的数据集进行了评估。我们的结果表明，基于查询学习的方法在最终预测准确性和时间效率方面均优于其他三种方法。我们还检查了数据中噪声的影响，发现该方法的优点对于较大的噪声水平变得更加明显。

著录项

来源
《》|2003年|p.253-257|共5页
会议地点
作者
Mamitsuka; H.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
pharmaceutical industry; learning (artificial intelligence); biochemistry; database management systems; medical computing; patient treatment; ensemble feature subset selection methods; high-dimensional database; drug design; pharmaceutical field; molecul;

机译：制药工业;学习（人工智能）;生物化学;数据库管理系统;医学计算;患者治疗;整体特征子集选择方法;高维数据库;药物设计;制药领域;分子;

相似文献

外文文献
中文文献
专利

1. Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets [J] . Hiroshi Mamitsuka Knowledge and information systems . 2006,第1期

机译：从高维数据集学习的基于查询学习的迭代特征子集选择
2. Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets [J] . Hiroshi Mamitsuka Knowledge and Information Systems . 2006,第1期

机译：从高维数据集学习的基于查询学习的迭代特征子集选择
3. AN EMPIRICAL EVALUATION FOR THE INTRUSION DETECTION FEATURES BASED ON MACHINE LEARNING AND FEATURE SELECTION METHODS [J] . MOUHAMMD ALKASASSBEH Journal of Theoretical and Applied Information Technology . 2017,第22期

机译：基于机器学习和特征选择方法的入侵检测特征的实证评估
4. Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design [C] . Hiroshi Mamitsuka Institute of Electrical and Electronics Engineers Symposium on Bioinformatics and Bioengineering . 2003

机译：合奏特征子集选择方法的实证评价，用于学习药物设计中的高维数据库
5. Feature selection and statistical alternatives for machine learning applied to in-silico drug design. [D] . Arciniegas, Fabio Andres. 2002

机译：用于计算机学习药物的机器学习的特征选择和统计替代方案。
6. Dysphonic Voice Pattern Analysis of Patients in Parkinsons Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods [O] . Yunfeng Wu, Pinnan Chen, Yuchen Yao, 2017

机译：使用最小类间概率风险特征选择和袋装组合学习方法对帕金森病患者的口音模式进行分析
7. Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection [O] . Mohammad Bagher Dowlatshahi, Vali Derhami, Hossein Nezamabadi-pour 2017

机译：基于滤波器的Rankers集合引导用于高维特征子集选择的Epsilon-Greedy swarm优化器

Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design

摘要

著录项

相似文献

相关主题

期刊订阅