Highly Scalable Text Mining - Parallel Tagging Application

机译：高度可扩展的文本挖掘 - 并行标记应用程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is an urgent need to develop new text mining solutions using High Performance Computing (HPC) and grid environments to tackle exponential growth in text data. Problem sizes are increasing by the day by addition of new text docments. The task of labelling sequence data such as part-of-speech (POS) tagging, chunking (shallow parsing) and named entity recognition is one of the most important tasks in Text Mining. Genia is a POS tagger which is specifically tuned for biomedical text. Genia is built with maximum entropy modelling and state of the art tagging algorithm. A Parallel version of genia tagger application has been implemented and performance has been compared on a number of different architectures. The focus has been particularly on scalability of the application. Scaling of 512 processors has been achieved and a method to scale to 10000 processors is proposed for massively parallel Text Mining applications. The parallel implementation of genia tagger is done using MPI for achieving portable code.

机译：迫切需要使用高性能计算（HPC）和网格环境开发新的文本挖掘解决方案，以解决文本数据中的指数增长。通过添加新的文本文本，问题尺寸在日趋增加。标记序列数据的任务如语音部分（POS）标记，块（浅析解析）和命名实体识别是文本挖掘中最重要的任务之一。 Genia是一个专门调整生物医学文本的POS标记器。最大熵建模和艺术标记算法的最大熵建筑和最大的Genia。已经实现了Penia标签应用程序的并行版本，并在许多不同的架构上进行了性能。重点特别是应用程序的可扩展性。已经实现了512个处理器的缩放，并提出了一种用于缩放到10000处理器的方法，用于大规模并行文本挖掘应用程序。使用MPI来实现Genia标签的并行实现，以实现便携式代码。

著录项

来源
《International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control》|2009年||共4页
会议地点
作者
Firat Tekiner; Yoshimasa Tsuruoka; Junichi Tsujii; Sophia Ananiadou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
HPC; Text Mining; Parsing; Parallel Computing;

机译：HPC;文本挖掘;解析;平行计算;

相似文献

外文文献
中文文献
专利

1. A survey of methods to ease the development of highly multilingual text mining applications [J] . Ralf Steinberger Language Resources and Evaluation . 2012,第2期

机译：简化高度多语言文本挖掘应用程序开发方法的概述
2. MODYLAS: A Highly Parallelized General-Purpose Molecular Dynamics Simulation Program for Large-Scale Systems with Long-Range Forces Calculated by Fast Multipole Method (FMM) and Highly Scalable Fine-Grained New Parallel Processing Algorithms [J] . Yoshimichi Andoh, Noriyuki Yoshii, Kazushi Fujimoto Journal of chemical theory and computation: JCTC . 2013,第7期

机译：MODYLAS：具有并行力的大型多用途通用分子动力学仿真程序，该程序由快速多极方法（FMM）和高度可扩展的细粒度新并行处理算法计算而得
3. A Decoupled Architecture for Scalability in Text Mining Applications [J] . Jorge Villalon, Rafael A. Calvo Journal of Universal Computer Science . 2013,第3期

机译：用于文本挖掘应用程序的可伸缩性的解耦架构
4. Highly scalable Text Mining - parallel tagging application [C] . Tekiner Firat, Tsuruoka Yoshimasa, Tsujii Junichi, Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009 . 2009

机译：高度可扩展的文本挖掘-并行标记应用程序
5. Discovering latent topical phrases in document collections and networks with text components: Leveraging text mining and information network analysis for human oriented applications. [D] . Danilevsky, Marina Grigoryevna. 2014

机译：在文档集合和带有文本组件的网络中发现潜在的主题短语：利用面向人类的应用程序的文本挖掘和信息网络分析。
6. Churchill: an ultra-fast deterministic highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics [O] . Benjamin J Kelly, James R Fitch, Yangqiu Hu, 2015

机译：丘吉尔：一种超快速确定性高度可扩展且平衡的并行化策略用于发现临床和人群规模基因组学中的人类遗传变异
7. Highly Scalable Text Mining – Parallel Tagging Application [O] . Firat Tekiner, Yoshimasa Tsuruoka 2010

机译：高度可扩展的文本挖掘–并行标记应用
8. PADMA: PArallel Data Mining Agents for scalable text classification [R] . Kargupta, H. , Hamzaoglu, I. , Stafford, B. 1997

机译：paDma：用于可扩展文本分类的并行数据挖掘代理

Highly Scalable Text Mining - Parallel Tagging Application

摘要

著录项

相似文献

相关主题

期刊订阅