首页> 外文学位 >Metodos para mejorar la calidad de un conjunto de datos para descubrir conocimiento.

【24h】

Metodos para mejorar la calidad de un conjunto de datos para descubrir conocimiento.

机译：改善发现知识的数据集质量的方法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today, data generation is growing exponentially in both directions; instances (rows) and features (columns). This causes that many datasets can not be analyzed without preprocessing. The large size of the dataset to be analyzed may produce serious problems to some data mining algorithms in scalability as well in performance. On the other hand the quality of the data could be inadequate for the knowledge discovery process. For this reason, it is necessary to preprocess the dataset to make it suitable for an efficient performance of the data mining algorithm, and in order to obtain accurate results from it. In this thesis, we introduced new measures to evaluate the quality of a dataset in the context of supervised classification. From these quality measures, we obtain two ways of quantifying the data complexity for a classification problem, specifically, we try to anticipate the behavior of a classification algorithm given a dataset. Our data complexity measures are compared with others already available in the literature, and they give similar performance, but with a lower computational cost. For data cleaning, we propose a new method, which is independent of the classification algorithm. The proposed method detects and eliminates the noise in each class. Our method performs with more efficiency and accuracy than other methods already available in the literature. In the context of dimensionality reduction, we propose two new methods for feature selection. These methods are compared with two well known feature selection methods, the RELIEF and the Sequential Forward Selection (SFS), and similar results are obtained but with a much lower computational costs. Furthermore, we propose a new algorithm, which improves the scalability of the algorithms for instance selection currently in use. Finally, we integrate the three processes: data cleaning, reduction of dimensionality, and instance selection, in order to generate a training set, which it will permit an efficient performance of the data mining algorithms, yielding accurate results.

机译：如今，数据生成在两个方向都呈指数增长。实例（行）和要素（列）。这导致许多数据集如果没有预处理就无法分析。要分析的数据集的大小可能会给某些数据挖掘算法在可伸缩性和性能方面带来严重问题。另一方面，数据的质量可能不足以进行知识发现过程。因此，有必要对数据集进行预处理以使其适合于数据挖掘算法的有效执行，并从中获取准确的结果。在本文中，我们介绍了在监督分类的背景下评估数据集质量的新方法。从这些质量度量中，我们获得了两种量化分类问题数据复杂度的方法，具体地说，我们尝试在给定数据集的情况下预期分类算法的行为。我们将数据复杂性度量与文献中已有的其他度量进行了比较，它们具有相似的性能，但计算成本较低。对于数据清理，我们提出了一种新的方法，该方法与分类算法无关。所提出的方法检测并消除每个类别中的噪声。我们的方法比文献中已有的其他方法具有更高的效率和准确性。在降维的背景下，我们提出了两种新的特征选择方法。将这些方法与两种众所周知的特征选择方法RELIEF和顺序前向选择（SFS）进行了比较，虽然获得了相似的结果，但计算成本却低得多。此外，我们提出了一种新算法，该算法提高了当前使用的实例选择算法的可扩展性。最后，我们集成了三个过程：数据清理，降维和实例选择，以生成训练集，这将使数据挖掘算法高效执行，并产生准确的结果。

著录项

作者
Daza Portocarrero, Luis Alberto.;
展开▼
作者单位

University of Puerto Rico, Mayaguez (Puerto Rico).;

展开▼
授予单位 University of Puerto Rico, Mayaguez (Puerto Rico).;
学科 Statistics.; Computer Science.
学位 Ph.D.
年度 2008
页码 172 p.
总页数 172
原文格式 PDF
正文语种 eng
中图分类统计学;自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:39:12

相似文献

外文文献
中文文献
专利

1. Innovación en Minería de Datos para el Tratamiento de Imágenes: Agrupamiento K-media para Conjuntos de Datos de Forma Alargada y su Aplicación en la Agroindustria [J] . Pham Trung T., Lobos Gustavo A., Vidal-Silva Cristian L. Informacion Tecnologica . 2019,第2期

机译：图像处理数据挖掘中的创新：细长形状数据集的K均值聚类及其在农业工业中的应用
2. Calidad de los datos del Instituto Nacional de Estadística para la elaboración de los indicadores de salud perinatal: peque?o y grande para su edad gestacional [J] . Revista Espa?ola de Salud Pública . 2015,第1期

机译：国家统计研究所用于制定围产期健康指标的数据的质量：适合胎龄的大小
3. Qualidade de vida: Interpreta??o da sintaxe do SPSS para análise de dados do WHOQOL-100↓Calidad de vida: interpretación de la sintaxis del SPSS para el análisis de datos del WHOQOL-100 [J] . Bilynkievycz dos Santos Celso, Pedroso Bruno, Scandelari Luciano, Revista de Salud Pública . 2009,第5期

机译：生活质量：解释WHOQOL-100数据分析的SPSS语法↓生活质量：解释WHOQOL-100数据分析的SPSS语法
4. La opinión de los alumnos como herramienta para mejorar la calidad docente en ingeniería mecánica [C] . A. Díaz, P.Lafont, J. L. Muñoz, CNIM 2010;Congreso nacional de ingenieria mecanica . 2011

机译：学生的意见是提高机械工程教学质量的工具
5. Creacion de un indicador sintetico aplicando la metodologia de analisis envolvente de datos para evaluar la calidad y la eficiencia de las instituciones de Educacion Superior privadas en Puerto Rico. [D] . Acevedo Cruz, Blanca Idalyz. 2015

机译：运用数据包络分析方法创建综合指标，以评估波多黎各私立高等教育机构的质量和效率。
6. Ensayo clínico aleatorizado para evaluar la eficacia de una intervención multifactorial para reducir las hospitalizaciones y mejorar la calidad de vida de los pacientes con insuficiencia cardíaca [O] . C. Brotons, M. Martínez, E. Rayó, 2005

机译：随机临床试验评估多因其干预的有效性以减少住院治疗提高心力衰竭患者的生活质量
7. Utilización de técnicas de clustering para mejorar la detección de meta-topics en conjuntos de datos extraidos de Twitter [O] . Delgado Calle, Carlos 2015

机译：使用聚类技术来改进从Twitter提取的数据集中的元主题的检测
8. Metodos de Analisis para el Control de Calidad de Concentrado de Tomate (Methods of Analysis for the Quality Control of Tomato Concentrate) [R] . 1983

机译：metodos de analisis para el Control de Calidad de Concentrado de Tomate（番茄浓缩物质量控制分析方法）

Metodos para mejorar la calidad de un conjunto de datos para descubrir conocimiento.

摘要

著录项

相似文献

相关主题

期刊订阅