Data mining in tree-based models and large-scale contingency tables.

机译：基于树的模型和大规模列联表中的数据挖掘。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis is composed of two parts. The first part pertains to tree-based models. The second part deals with multiple testing in large-scale contingency tables. Tree-based models have gained enormous popularity in statistical modeling and data mining. We propose a novel tree-pruning algorithm called frontier-based tree-pruning algorithm (FBP). The new method has an order of computational complexity comparable to cost-complexity pruning (CCP). Regarding tree pruning, it provides a full spectrum of information. Numerical study on real data sets reveals a surprise: in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of cross validation, which is favored by simulations.; One of the most common test procedures using two-way contingency tables is the test of independence between two categorizations. Current test procedures such as chi-square or likelihood ratio tests provide overall independency but bring limited information about the nature of the association in contingency tables. We propose an approach of testing independence of categories in individual cells of contingency tables based on a multiple testing framework. We then employ the proposed method to identify the patterns of pair-wise associations between amino acids involved in beta-sheet bridges of proteins. We identify a number of amino acid pairs that exhibit either strong or weak association. These patterns provide useful information for algorithms that predict secondary and tertiary structures of proteins.

机译：本文由两部分组成。第一部分涉及基于树的模型。第二部分处理大型列联表中的多个测试。基于树的模型在统计建模和数据挖掘中获得了极大的普及。我们提出了一种新颖的树修剪算法，称为基于边界的树修剪算法（FBP）。新方法的计算复杂度可与成本复杂度修剪（CCP）相媲美。关于树修剪，它提供了完整的信息。对真实数据集的数值研究揭示了一个惊喜：在复杂度惩罚方法中，大多数树大小是不允许的。 FBP促进了交叉验证的更加忠实的实现，这受到仿真的青睐。使用双向列联表的最常见测试程序之一是测试两种分类之间的独立性。当前的测试程序（例如卡方检验或似然比检验）提供了总体独立性，但在列联表中仅提供了有关关联性质的有限信息。我们提出了一种基于多重测试框架的测试列联表各个单元格中类别独立性的方法。然后，我们采用提出的方法来确定参与蛋白质的β-折叠桥的氨基酸之间成对关联的模式。我们确定了显示出强或弱关联的许多氨基酸对。这些模式为预测蛋白质二级和三级结构的算法提供了有用的信息。

著录项

作者
Kim, Seoung Bum.;
展开▼
作者单位

Georgia Institute of Technology.;

展开▼
授予单位 Georgia Institute of Technology.;
学科 Engineering Industrial.
学位 Ph.D.
年度 2005
页码 160 p.
总页数 160
原文格式 PDF
正文语种 eng
中图分类一般工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. Ensemble data mining modeling in corrosion of concrete sewer: A comparative study of network-based (MLPNN & RBFNN) and tree-based (RF, CHAID, & CART) models [J] . Mohammad Zounemat-Kermani, Dietmar Stephan, Matthias Barjenbruch, Advanced engineering informatics . 2020,第Jana期

机译：混凝土污水管道腐蚀中的集合数据挖掘建模：基于网络的模型（MLPNN和RBFNN）和基于树的模型（RF，CHAID和CART）的比较研究
2. Driving risk assessment using near-crash database through data mining of tree-based model [J] . Wang Jianqiang, Zheng Yang, Li Xiaofei, Accident Analysis & Prevention . 2015,第NOVa期

机译：通过基于树的模型的数据挖掘，使用近碰撞数据库进行风险评估
3. Data mining of tree-based models to analyze freeway accident frequency [J] . Li-Yen Chang, Wen-Chieh Chen Journal of Safety Research . 2005,第4期

机译：基于树的模型的数据挖掘以分析高速公路事故发生频率
4. A prefix tree-based model for mining association rules from quantitative temporal data [C] . Yo-Ping Huang, Li-Jen Kao, Sandnes, . 2005

机译：基于前缀树的模型，用于从定量时间数据中挖掘关联规则
5. Sparse and large-scale learning models and algorithms for mining heterogeneous big data. [D] . Cai, Xiao. 2013

机译：用于挖掘异构大数据的稀疏大规模学习模型和算法。
6. Data mining of the GAW14 simulated data using rough set theory and tree-based methods [O] . Liang-Ying Wei, Cheng-Lung Huang, Chien-Hsiun Chen 2005

机译：使用粗糙集理论和基于树的方法对GAW14模拟数据进行数据挖掘
7. Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants [O] . Colignatus Thomas 2007

机译：列联表中的相关和回归。使用决定因素衡量名义数据（列联表）中的关联或相关性
8. ECONOMICS OF LARGE-SCALE SURFACE COAL MINING USING SIMULATION MODELS USER'S MANUAL FOR SHOVEL TRUCK MINING MICROMODELS Volume 15 [R] . 1977

机译：采用模拟模型的大型表面采煤经济学用于手提式采矿微型钻头的用户手册第15卷

Data mining in tree-based models and large-scale contingency tables.

摘要

著录项

相似文献

相关主题

期刊订阅