首页> 外文学位 >Knowledge Discovery from Databases: Cost-sensitive and imbalance learning.

【24h】

Knowledge Discovery from Databases: Cost-sensitive and imbalance learning.

机译：从数据库中发现知识：成本敏感和不平衡的学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the current business world, data collection for business analysis is not difficult any more. The major concern faced by business managers is whether they can use data to build predictive models so as to provide accurate information for decision-making. Knowledge Discovery from Databases (KDD) provides us a guideline for collecting data through identifying knowledge inside data. As one of the KDD steps, the data mining method provides a systematic and intelligent approach to learning a large amount of data and is critical to the success of KDD. In the past several decades, many different data mining algorithms have been developed and can be categorized as classification, association rule, and clustering. These data mining algorithms have been demonstrated to be very effective in solving different business questions. Among these data mining types, classification is the most popular group and is widely used in all kinds of business areas. However, the exiting classification algorithm is designed to maximize the prediction accuracy given by the assumption of equal class distribution and equal error costs. This assumption seldom holds in the real world. Thus, it is necessary to extend the current classification so that it can deal with the data with the imbalanced distribution and unequal costs. In this dissertation, I propose an Iterative Cost-sensitive Naive Bayes (ICSNB) method aimed at reducing overall misclassification cost regardless of class distribution. During each iteration, k nearest neighbors are identified and form a new training set, which is used to learn unsolved instances. Using the characteristics of the nearest neighbor method, I also develop a new under-sampling method to solve the imbalance problem in the second study. In the second study, I design a general method to deal with the imbalance problem and identify noisy instances from the data set to create a balanced data set for learning. Both of these two methods are validated using multiple real world data sets. The empirical results show the superior performance of my methods compared to some existing and popular methods.

机译：在当前的商业世界中，用于业务分析的数据收集不再困难。业务经理面临的主要问题是，他们是否可以使用数据来建立预测模型，以便为决策提供准确的信息。数据库知识发现（KDD）为我们提供了通过识别数据内部知识来收集数据的指南。作为KDD的步骤之一，数据挖掘方法提供了一种系统的，智能的方法来学习大量数据，这对于KDD的成功至关重要。在过去的几十年中，已经开发了许多不同的数据挖掘算法，可以将其分类为分类，关联规则和聚类。这些数据挖掘算法已被证明在解决不同的业务问题方面非常有效。在这些数据挖掘类型中，分类是最受欢迎的组，并广泛用于各种业务领域。但是，现有分类算法的设计目的是使类别分布相同和错误成本相等的假设所给的预测准确性最大化。这个假设在现实世界中很少成立。因此，有必要扩展当前分类，以便它可以处理分布不均，成本不平等的数据。本文提出了一种迭代成本敏感的朴素贝叶斯算法，其目的是减少总的分类错误成本，而与类别分布无关。在每次迭代期间，将识别k个最近的邻居，并形成一个新的训练集，该训练集用于学习未解决的实例。利用最近邻方法的特点，我还开发了一种新的欠采样方法来解决第二项研究中的不平衡问题。在第二项研究中，我设计了一种通用方法来处理不平衡问题，并从数据集中识别出嘈杂的实例，以创建一个平衡的数据集进行学习。这两种方法都使用多个真实世界的数据集进行了验证。实验结果表明，与某些现有和流行方法相比，我的方法具有更好的性能。

著录项

作者
Yang, Zhuo.;
展开▼
作者单位

The University of Utah.;

展开▼
授予单位 The University of Utah.;
学科 Business Administration Management.;Information Technology.
学位 Ph.D.
年度 2010
页码 107 p.
总页数 107
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects [J] . Siers Michael J., Islam Md Zahidul Information Sciences: An International Journal . 2018,第期

机译：用于NASA软件缺陷的类商业数据集中成本敏感分类和知识发现的新颖算法
2. The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database. [J] . Bate A, Lindquist M, Edwards IR Fundamental & clinical pharmacology. . 2008,第2期

机译：数据库中知识发现在上市后药物安全中的应用：WHO数据库示例。
3. Exploring Knowledge for a Common Man through Mobile Services and Knowledge Discovery in Databases [J] . Mayank Dave, S. B. Singh, Sanjeev Manchanda International Journal of Computer Science and Security . 2009,第1期

机译：通过移动服务和数据库中的知识发现为普通人探索知识
4. Prospects and limitations in the context of knowledge discovery in database for manipulation of domains through ontologies to support the modeling of data warehouse -Case study in social databases [C] . Monteiro Adriana Costa, Galvez Luis Enrique Zarate 38th Latin America Conference on Informatics. . 2012

机译：在数据库中知识发现方面的前景和局限性，这些知识用于通过本体操纵域以支持数据仓库建模-社会数据库中的案例研究
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. liqDB: a small-RNAseq knowledge discovery database for liquid biopsy studies [O] . Ernesto Aparicio-Puerta, David Jáspez, Ricardo Lebrón, 2019

机译：liqDB：用于液体活检研究的小RNAseq知识发现数据库
7. Comparing Machine Learning and Knowledge Discovery in DataBases: An Application to Knowledge Discovery in Texts [O] . Yves Kodratoff 2000

机译：比较机器学习和数据库中的知识发现：文本中知识发现的应用
8. Mission Dependency Index of Air Force Built Infrastructure: Knowledge Discovery with Machine Learning. [R] . Smith, C. W. 2016

机译：空军建筑基础设施的任务依赖指数：利用机器学习进行知识发现。

Knowledge Discovery from Databases: Cost-sensitive and imbalance learning.

摘要

著录项

相似文献

相关主题

期刊订阅