首页> 外文期刊>Indian Journal of Science and Technology >Training the SVM to Larger Dataset Applications using the SVM Sampling Technique
【24h】

Training the SVM to Larger Dataset Applications using the SVM Sampling Technique

机译:使用SVM采样技术将SVM训练到更大的数据集应用程序

获取原文
       

摘要

With increasing amounts of data being generated by businesses and researchers there is a need for fast, accurate and robust algorithms for data analysis. Improvements in databases technology, computing performance and artificial intelligence have contributed to the development of intelligent data analysis. The primary aim of data mining is to discover patterns in the data that lead to better understanding of the data generating process and to useful predictions. Examples of applications of data mining include detecting fraudulent credit card transactions, character recognition in automated zip code reading, and predicting compound activity in drug discovery. Real-world data sets are often characterized by having large numbers of examples, e.g. billions of credit card transactions and potential 'drug-like' compounds; being highly unbalanced, e.g. most transactions are not fraudulent, most compounds are not active against a given biological target; and, being corrupted by noise. The relationship between predictive variables, e.g. physical descriptors, and the target concept, e.g. compound activity, is often highly non-linear. One recent technique that has been developed to address these issues is the Support Vector Machine. The Support Vector Machine has been developed as robust tool for classification and regression in noisy, complex domains. The two key features of Support Vector Machines are generalization theory, which leads to a principled way to choose a hypothesis; and, kernel functions, which introduce non-linearity in the hypothesis space without explicitly requiring a non-linear algorithm. In this paper we introduce Support Vector Machines cascade SVM and randomized sampling technique highlight the advantages thereof over existing data analysis techniques, also are noted some important points for the data mining practitioner who wishes to use Support Vector Machines.
机译:随着企业和研究人员生成的数据量不断增加,需要一种快速,准确和健壮的数据分析算法。数据库技术,计算性能和人工智能的进步为智能数据分析的发展做出了贡献。数据挖掘的主要目的是发现数据中的模式,从而更好地理解数据生成过程并进行有用的预测。数据挖掘的应用示例包括检测欺诈性信用卡交易,自动邮政编码读取中的字符识别以及预测药物发现中的化合物活动。现实世界的数据集通常以大量示例为特征,例如数十亿笔信用卡交易和潜在的“类药物”化合物;高度不平衡大多数交易不是欺诈性的,大多数化合物对给定的生物学目标没有活性;并且,被噪音破坏了。预测变量之间的关系,例如物理描述符和目标概念,例如复合活动通常是高度非线性的。已经开发出解决这些问题的一种最新技术是支持向量机。支持向量机已开发为在嘈杂,复杂域中进行分类和回归的强大工具。支持向量机的两个关键特征是泛化理论,这导致了一种选择假设的原则方法。内核函数,在假设空间中引入非线性,而无需明确地要求非线性算法。在本文中,我们介绍了支持向量机级联支持向量机,随机抽样技术突出了其与现有数据分析技术相比的优势,同时也为希望使用支持向量机的数据挖掘从业人员提供了一些重要信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号