...
首页> 外文期刊>Concurrency and computation: practice and experience >NLPHub: An e-Infrastructure-based text mining hub
【24h】

NLPHub: An e-Infrastructure-based text mining hub

机译:nlphub:基于电子基础设施的文本挖掘中心

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Text mining involves a set of processes that analyze text to extract high-quality information. Among its large number of applications, there are experiments that tackle big data challenges using complex system architectures. However, text mining approaches are neither easy to discover and use nor easily combinable by end-users. Furthermore, they should be contextualized within new approaches to science (eg, Open Science) that ensure longevity and reuse of methods and results. This article presents NLPHub, a distributed system that orchestrates and combines several state-of-the-art text mining services that recognize spatiotemporal events, keywords, and a large set of named entities. NLPHub adopts an Open Science approach, which fosters the reproducibility, repeatability, and reusability of methods and results, by using an e-Infrastructure supporting data-intensive Science. NLPHub adds Open Science-compliance to the connected services through the use of representational standards for services and computations. It also manages heterogeneous service access policies and enables collaboration and sharing facilities. This article reports a performance assessment based on an annotated corpus of named entities, which demonstrates that NLPHub can improve the performance of the single-integrated processes by cleverly combining their output.
机译:文本挖掘涉及一组分析文本以提取高质量信息的进程。在大量应用中,存在使用复杂系统架构解决大数据挑战的实验。但是,文本挖掘方法既不易于发现和使用,也不是最终用户可以轻松地组合。此外,它们应该在科学(例如,开放科学)的新方法中,以确保寿命和重复使用方法和结果。本文介绍了NLPHUB,一个分布式系统,该系统编排并结合了识别时空事件,关键字和一大集命名实体的最先进的文本挖掘服务。 Nlphub采用开放式科学方法,通过支持数据密集型科学的电子基础设施,促进了方法和结果的再现性,重复性和可重用性。 NLPHUB通过使用代表性和计算标准,将开放的Science-遵守与关联服务。它还管理异构服务访问策略并启用协作和共享设施。本文报告了基于命名实体的注释语料库的性能评估,这表明NlPhub可以通过巧妙地组合其输出来提高单一集成过程的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号