Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources

María Novo-Lourés; Reyes Pavón; Rosalía Laza; David Ruano-Ordas; Jose R. Méndez

首页> 外文期刊>Scientific programming >Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources

【24h】

Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources

机译：使用自然语言预处理架构（NLPA）为大数据文本源

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

During the last years, big data analysis has become a popular means of taking advantage of multiple (initially valueless) sources to find relevant knowledge about real domains. However, a large number of big data sources provide textual unstructured data. A proper analysis requires tools able to adequately combine big data and text-analysing techniques. Keeping this in mind, we combined a pipelining framework (BDP4J (Big Data Pipelining For Java)) with the implementation of a set of text preprocessing techniques in order to create NLPA (Natural Language Preprocessing Architecture), an extendable open-source plugin implementing preprocessing steps that can be easily combined to create a pipeline. Additionally, NLPA incorporates the possibility of generating datasets using either a classical token-based representation of data or newer synset-based datasets that would be further processed using semantic information (i.e., using ontologies). This work presents a case study of NLPA operation covering the transformation of raw heterogeneous big data into different dataset representations (synsets and tokens) and using the Weka application programming interface (API) to launch two well-known classifiers.

机译：在过去几年中，大数据分析已成为利用多种（最初无价值）来源的流行手段，以找到关于真实域的相关知识。但是，大量大数据源提供了文本非结构化数据。适当的分析需要能够充分结合大数据和文本分析技术的工具。请记住这一点，我们将管道线框架（BDP4J（Java大数据流水线）组合使用了一组文本预处理技术，以创建NLPA（自然语言预处理架构），可扩展的开源插件实现预处理可以轻松组合以创建管道的步骤。另外，NLPA包括使用基于数据的基于令牌的代表或基于SYNSEN的数据集的基于数据集的基于数据集的基于数据集来结合使用的可能性，这些数据集将使用语义信息（即，使用本体）进一步处理。这项工作提出了一种案例研究，涵盖了原始异构大数据转换为不同的数据集表示（Synsets和令牌），并使用Weka应用程序编程接口（API）来启动两个公知的分类器的NLPA操作。

著录项

来源
《Scientific programming》 |2020年第3期|共13页
作者
María Novo-Lourés; Reyes Pavón; Rosalía Laza; David Ruano-Ordas; Jose R. Méndez;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multi-text classification of Urdu/Roman using machine learning and natural language preprocessing techniques [J] . M Ameen Chhajro, Mansoor Ahmed Khuhro, Kamlesh Kumar, Indian Journal of Science and Technology . 2020,第19期

机译：Urdu / Roman使用机器学习和自然语言预处理技术的多文本分类
2. Preprocessing for PPM: Compressing Utf-8 Encoded Natural Language Text [J] . William J.Teahan, Khaled M.Alhawiti International Journal of Computer Science & Information Technology (IJCSIT) . 2015,第2期

机译：PPM的预处理：压缩Utf-8编码的自然语言文本
3. Natural language compression on Edge-Guided text preprocessing [J] . Martínez-Prieto M.A., Adiego J., De La Fuente P. Information Sciences: An International Journal . 2011,第24期

机译：边缘引导文本预处理中的自然语言压缩
4. A Public Health Surveillance Platform Exploiting Free-Text Sources via Natural Language Processing and Linked Data: Application in Adverse Drug Reaction Signal Detection Using PubMed and Twitter [C] . Pantelis Natsiavas, Nicos Maglaveras, Vassilis Koutkias International workshop on process-oriented information systems in health-care;International workshop on knowledge representation for health care . 2017

机译：通过自然语言处理和链接数据开发自由文本源的公共卫生监视平台：在使用PubMed和Twitter的药物不良反应信号检测中的应用
5. Text-to-Speech Synthesis Using Found Data for Low-Resource Languages [D] . Cooper, Erica 2019

机译：使用低资源语言的数据对文本进行语音合成
6. Natural Language Processing and Automatic SNOMED-Encoding of Free Text: An Analysis of Free Text Data from a Routine Electronic Patient Record Application with a Parsing Tool Using the German SNOMED II [O] . Joerg H. Hohnloser, Matthias Holzer, Martin R.G. Fischer, 1996

机译：自然语言处理和自由文本的自动SNOMED编码：使用德语SNOMED II的解析工具对例行电子病历应用中的自由文本数据进行分析
7. Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources [O] . María Novo-Lourés, Reyes Pavón, Rosalía Laza, 2020

机译：使用自然语言预处理架构（NLPA）为大数据文本源

Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources

摘要

著录项

相似文献

相关主题

期刊订阅