首页> 美国卫生研究院文献>other >Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files

【2h】

Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files

机译：Mynodbcsv：轻量级零配置数据库解决方案用于处理非常大的CSV文件

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: “no copy” approach – data stay mostly in the CSV files; “zero configuration” – no need to specify database schema; written in C++, with boost , SQLite and Qt , doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed textumbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website . It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.

机译：科学和工业中使用的数据量正在迅速增长。当研究人员面临分析它们的挑战时，其格式通常是第一个障碍。缺乏探索不同数据布局的标准化方法，每次都需要付出努力以从头解决问题。可以以丰富，统一的方式访问数据，例如使用结构化查询语言（SQL）将提供表现力和用户友好性。逗号分隔值（CSV）是最常见的数据存储格式之一。尽管它很简单，但是随着文件大小的增长，它变得不平凡。将CSV导入到现有数据库中既费时又麻烦，如果其水平尺寸达到数千列，甚至是不可能的。大多数数据库已针对处理大量行而不是列进行了优化，因此，具有非典型布局的数据集的性能通常是不可接受的。其他挑战包括架构创建，更新和重复数据导入。为了解决上述问题，我提出了一种通过SQL访问基于CSV的大型数据集的系统。它的特点是：“无复制”方法–数据大部分保留在CSV文件中； “零配置” –无需指定数据库架构；用boost，SQLite和Qt用C ++编写，不需要安装，并且体积很小。查询重写，为适当的列动态创建索引以及直接从CSV文件中检索静态数据可确保有效地执行计划；毫不费力地支持数百万列；由于按值输入，使用混合文本/数字数据很容易；非常简单的网络协议为MATLAB提供了有效的接口，并减少了其他语言的实现时间。该软件可免费下载，其网站上还提供教育视频。它不需要任何先决条件，因为所有库都包含在分发包中。我使用一系列基准针对现有数据库解决方案进行了测试，并讨论了结果。

著录项

期刊名称 other
作者
Stanisław Adaszewski;
展开▼
作者单位

展开▼
年(卷),期 -1(9),7
年度 -1
页码 e103319
总页数 8
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. CSV2RDF: GENERATING RDF DATA FROM CSV FILE USING SEMANTIC WEB TECHNOLOGIES [J] . S M HASAN MAHMUD, ALTAB HOSSIN, HOSNEY JAHAN, Journal of Theoretical and Applied Information Technology . 2018,第20期

机译：CSV2RDF：使用语义Web技术生成来自CSV文件的RDF数据
2. Converting CSV Files to RKSML Files [J] . NASA Tech Briefs . 2009,第6期

机译：将CSV文件转换为RKSML文件
3. Lightweight crossbars for safe and flexible handling solutions [J] . Schiff & Hafen . 2017,第6期

机译：轻巧的横杆，提供安全灵活的搬运解决方案
4. Creating Database for Traditional Dance Categorization using CSV File Format [C] . Khairurizal Alfathdyanto, Maria Shusanti Febrianti, Ary Setijadi Prihatmanto, . 2018

机译：使用CSV文件格式创建用于传统舞蹈分类的数据库
5. A COMPARISON OF FILE ALLOCATION DECISION-MAKING SCHEMES (REPLICATED FILES, DISTRIBUTED NETWORKS, SIMULATION EXPERIMENTS, DISTRIBUTED DATABASES) [D] . MALDONADO, MARTIN FROILAN. 1986

机译：文件分配决策方案（重复文件，分布式网络，模拟实验，分布式数据库）的比较
6. CSVS a crowdsourcing database of the Spanish population genetic variability [O] . María Peña-Chilet, Gema Roldán, Javier Perez-Florido, 2021

机译：CSV西班牙人群体遗传变异的众包数据库
7. Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files. [O] . Adaszewski, S. 2014

机译：Mynodbcsv：用于处理非常大的CSV文件的轻量级零配置数据库解决方案。

Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files

摘要

著录项

相似文献

相关主题

期刊订阅