A Scalable Document-Based Architecture for Text Analysis

机译：可扩展的基于文档的文本分析架构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Analyzing textual data is a very challenging task because of the huge volume of data generated daily. Fundamental issues in text analysis include the lack of structure in document datasets, the need for various preprocessing steps and performance and scaling issues. Existing text analysis architectures partly solve these issues, providing restrictive data schemas, addressing only one aspect of text preprocessing and focusing on one single task when dealing with performance optimization. Thus, we propose in this paper a new generic text analysis architecture, where document structure is flexible, many preprocessing techniques are integrated and textual datasets are indexed for efficient access. We implement our conceptual architecture using both a relational and a document-oriented database. Our experiments demonstrate the feasibility of our approach and the superiority of the document-oriented logical and physical implementation.

机译：由于每天生成大量数据，因此分析文本数据是一项非常具有挑战性的任务。文本分析的基本问题包括文档数据集缺乏结构，需要各种预处理步骤以及性能和缩放问题。现有的文本分析体系结构部分地解决了这些问题，提供了限制性的数据模式，仅解决了文本预处理的一个方面，并且在处理性能优化时只专注于一项任务。因此，我们在本文中提出了一种新的通用文本分析体系结构，其中文档结构灵活，集成了许多预处理技术，并对文本数据集进行了索引以进行有效访问。我们使用关系数据库和面向文档的数据库来实现我们的概念架构。我们的实验证明了我们方法的可行性以及面向文档的逻辑和物理实现的优越性。

著录项

来源
《International conference on advanced data mining and applications》|2016年|481-494|共14页
会议地点
作者
Ciprian-Octavian Truica; Jerome Darmont; Julien Velcin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Text analytics; Indexing methods; Document-oriented databases;

机译：文字分析;索引方法;面向文档的数据库;

相似文献

外文文献
中文文献
专利

1. Document-based topic coherence measures for news media text [J] . Korencic Damir, Ristov Strahil, Snajder Jan Expert Systems with Application . 2018,第DECa期

机译：新闻媒体文本的基于文档的主题一致性度量
2. OnTheFly: a tool for automated document-based text annotation, data linking and network generation [J] . Pavlopoulos Georgios A., Pafilis Evangelos, Kuhn M., Bioinformatics . 2009,第7期

机译：OnTheFly：一种用于基于文档的自动文本注释，数据链接和网络生成的工具
3. OnTheFly: a tool for automated document-based text annotation, data linking and network generation [J] . Georgios A. Pavlopoulos1* Evangelos Pafilis1 M. Kuhn1 Sean D. Hooper2 and Reinhard Schneider1 Bioinformatics . 2009,第7期

机译：OnTheFly：一种用于基于文档的自动文本注释，数据链接和网络生成的工具
4. A Scalable Document-Based Architecture for Text Analysis [C] . Ciprian-Octavian Truic?, Jér?me Darmont, Julien Velcin International Conference on Advanced Data Mining and Applications . 2016

机译：基于可扩展的文档的文本分析架构
5. Accurate, scalable, and informative modeling and analysis of complex workloads and large-scale microprocessor architectures. [D] . Cho, Chang Burm. 2008

机译：对复杂的工作负载和大规模微处理器体系结构进行准确，可扩展且信息丰富的建模和分析。
6. OnTheFly: a tool for automated document-based text annotation data linking and network generation [O] . Georgios A. Pavlopoulos, Evangelos Pafilis, M. Kuhn, -1

机译：OnTheFly：一种用于基于文档的自动文本注释数据链接和网络生成的工具
7. A Scalable Document-based Architecture for Text Analysis [O] . Truică, Ciprian-Octavian, Darmont, Jérôme, Velcin, Julien 2016

机译：基于文档分析的可扩展文档架构

A Scalable Document-Based Architecture for Text Analysis

摘要

著录项

相似文献

相关主题

期刊订阅