Visual management of large scale data mining projects

机译：大型数据挖掘项目的可视化管理

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences . We developed and tested our visualization techniques on this application.

机译：本文描述了一个统一的框架，用于可视化数百个机器学习实验的准备和结果。设计这些实验是为了提高序列预测酶功能的准确性，并且在许多情况下是成功的。我们的系统提供图形用户界面，用于定义和探索训练数据集和各种代表性的选择，检查由各种类型的学习算法引起的假设，可视化全局结果以及为特定的训练集（功能）和示例详细检查结果（蛋白质）。可视化工具通过大量序列数据和诱导知识充当导航辅助。他们为理解我们的成功和失败的意义和潜在的生物学解释提供了重要帮助。使用这些可视化，可以有效地识别模块化序列表示和归纳算法的弱点，这些弱点建议了更好的学习策略。开发我们的数据挖掘可视化工具包的上下文是根据蛋白质序列数据准确预测酶功能的问题。先前的工作^{表明，仅基于序列相似性，大约6％的酶蛋白序列可能被分配了错误的功能。为了检验这种假设，即使用机器学习技术和模块化域表示法进行更详细的序列分析可以解决其中的许多失败，我们设计了一系列250多个实验，使用信息理论决策树归纳和朴素贝叶斯学习在局部序列域上进行有问题的酶功能类别的表示。在一半以上的情况下，我们的方法能够完美地区分相似序列^{的各种可能功能。我们在此应用程序上开发并测试了可视化技术。}}

著录项

期刊名称 other
作者
I. Shah; L. Hunter;
展开▼
作者单位

展开▼
年(卷),期 -1(278–290),-1
年度 -1
页码 278–290
总页数 13
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-21 11:34:15

相似文献

外文文献
中文文献
专利

1. ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining [J] . Tianxiao Huan, Andrey Y Sivachenko, Scott H Harrison, BMC Bioinformatics . 2008,第SUPPLEMENTa9期

机译：ProteoLens：用于多尺度数据库驱动的生物网络数据挖掘的可视化分析工具
2. Hierarchical Visual Data Mining for Large-Scale Data [J] . Matthew Ward, Wei Peng, Xiaoning Wang Computational statistics . 2004,第1期

机译：大规模数据的分层可视数据挖掘
3. NEW DATA ON OILSANDS MINING WASTE; MANAGEMENT PROJECTS ADVANCE [J] . Oilsands Review Group Oilsands review . 2010,第10期

机译：含油垃圾开采的新数据;管理项目进展
4. Applying Process Mining to Support Management of Predictive Analytics/Data Mining Projects in a Decision Making Center [C] . Marlene Ofelia Sanchez Escobar, Rafael Lozano Espinosa, Jose Martin Molina Espinosa, International Conference on Systems and Informatics . 2019

机译：将过程挖掘应用于决策中心中的预测分析/数据挖掘项目的支持管理
5. Indexing, searching, and mining large-scale visual data via structured vector quantization. [D] . Yuan, Jiangbo. 2014

机译：通过结构化矢量量化索引，搜索和挖掘大规模可视数据。
6. TheCellVision.org: A Database for Visualizing and Mining High-Content Cell Imaging Projects [O] . Myra Paz David Masinas, Mojca Mattiazzi Usaj, Matej Usaj, 2020

机译：thecellvision.org：用于可视化和挖掘高内容细胞成像项目的数据库
7. Visual Management of Large Scale Data Mining Projects [O] . I. Shah, L. Hunter 2007

机译：大规模数据挖掘项目的可视化管理

Visual management of large scale data mining projects

摘要

著录项

相似文献

相关主题

期刊订阅