Docria: Processing and Storing Linguistic Data with Wikipedia

机译：Docria：使用Wikipedia处理和存储语言数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The availability of user-generated content has increased significantly over time. Wikipedia is one example of a corpus, which spans a huge range of topics and is freely available. Storing and processing such corpora requires flexible document models as they may contain malicious or incorrect data. Docria is a library which attempts to address this issue with a model using typed property hypergraphs. Docria can be used with small to large corpora, from laptops using Python interactively in a Jupyter notebook to clusters running map-reduce frameworks with optimized compiled code. Docria is available as open-source code at https : //github . com/marcusklang/docria.

机译：随着时间的流逝，用户生成的内容的可用性已显着提高。维基百科是语料库的一个例子，它涵盖了广泛的主题并且可以免费获得。存储和处理此类语料库需要灵活的文档模型，因为它们可能包含恶意或不正确的数据。 Docria是一个尝试使用类型化的属性超图使用模型解决此问题的库。 Docria可以用于小型到大型的语料库，从在Jupyter笔记本中使用Python交互操作的笔记本电脑到运行带有优化编译代码的map-reduce框架的集群。 Docria可以在https：// github上作为开源代码获得。 com / marcusklang / docria。

著录项

来源
《Nordic conference of computational Linguistics》|2019年|363-368|共6页
会议地点 Turku(FI)
作者
Marcus Klang; Pierre Nugues;
展开▼
作者单位

Lund University Department of Computer Science S-221 00 Lund Sweden;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 14:42:12

相似文献

外文文献
中文文献
专利

1. caBIONet—A .NET wrapper to access and process genomic data stored at the National Cancer Institute's Center for Bioinformatics databases [J] . Piotr Kraj, Richard A. Mclndoe Bioinformatics . 2005,第16期

机译：caBINet-一个.NET包装器，用于访问和处理存储在美国国家癌症研究所生物信息中心数据库中的基因组数据
2. caBIONet—A .NET wrapper to access and process genomic data stored at the National Cancer Institute's Center for Bioinformatics databases [J] . Piotr Kraj, Richard A. Mclndoe Bioinformatics . 2005,第16期

机译：caBINet-一个.NET包装器，用于访问和处理存储在美国国家癌症研究所生物信息中心数据库中的基因组数据
3. Collaborative Compaction Optimization System using Near-Data Processing for LSM-tree-based Key-Value Stores [J] . Sun Hui, Liu Wei, Huang Jianzhong, Journal of Parallel and Distributed Computing . 2019,第SEPa期

机译：基于LSM树的键值存储的近数据协作压缩优化系统
4. Docria: Processing and Storing Linguistic Data with Wikipedia [C] . Marcus Klang, Pierre Nugues Nordic conference of computational Linguistics . 2019

机译：Docria：用维基百科加工和存储语言数据
5. Efficient XPath query processing on stored and streaming XML data. [D] . Chen, Yi. 2005

机译：对存储和流式XML数据进行高效的XPath查询处理。
6. Dataset of volatile compounds in fresh and stored cut watermelon (Citrullus lanatus) under varying processing and packaging conditions [O] . Michelle Louise Mendoza-Enano, Roger Stanley, Damian Frank 2019

机译：在不同加工和包装条件下新鲜和储存的切块西瓜（Citrullus lanatus）中挥发性化合物的数据集
7. A benchmarking methodology for the centralized-database computer with expandable and parallel database processors and stores [O] . Demurjian Steven A., Vincent James R., Hsiao David K. 1985

机译：具有可扩展和并行数据库处理器和存储的中央数据库计算机的基准测试方法

Docria: Processing and Storing Linguistic Data with Wikipedia

摘要

著录项

相似文献

相关主题

期刊订阅