首页> 外文会议>Nordic conference of computational Linguistics >Docria: Processing and Storing Linguistic Data with Wikipedia
【24h】

Docria: Processing and Storing Linguistic Data with Wikipedia

机译:Docria:使用Wikipedia处理和存储语言数据

获取原文

摘要

The availability of user-generated content has increased significantly over time. Wikipedia is one example of a corpus, which spans a huge range of topics and is freely available. Storing and processing such corpora requires flexible document models as they may contain malicious or incorrect data. Docria is a library which attempts to address this issue with a model using typed property hypergraphs. Docria can be used with small to large corpora, from laptops using Python interactively in a Jupyter notebook to clusters running map-reduce frameworks with optimized compiled code. Docria is available as open-source code at https : //github . com/marcusklang/docria.
机译:随着时间的流逝,用户生成的内容的可用性已显着提高。维基百科是语料库的一个例子,它涵盖了广泛的主题并且可以免费获得。存储和处理此类语料库需要灵活的文档模型,因为它们可能包含恶意或不正确的数据。 Docria是一个尝试使用类型化的属性超图使用模型解决此问题的库。 Docria可以用于小型到大型的语料库,从在Jupyter笔记本中使用Python交互操作的笔记本电脑到运行带有优化编译代码的map-reduce框架的集群。 Docria可以在https:// github上作为开源代码获得。 com / marcusklang / docria。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号