Docrep: A lightweight and efficient document representation framework

机译：DOCREP：轻量级和高效的文档表示框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modelling linguistic phenomena requires highly structured and complex data representations. Document representation frameworks (drfs) provide an interface to store and retrieve multiple annotation layers over a document. Researchers face a difficult choice: using a heavy-weight DRF or implement a custom drf. The cost is substantial, either learning a new complex system, or continually adding features to a home-grown system that risks overrunning its original scope. We introduce docrep, a lightweight and efficient drf, and compare it against existing drfs. We discuss our design goals and implementations in C++, Python, and Java. We transform the OntoNotes 5 corpus using docrep and uima, providing a quantitative comparison, as well as discussing modelling trade-offs. We conclude with qualitative feedback from researchers who have used docrep for their own projects. Ultimately, we hope docrep is useful for the busy researcher who wants the benefits of a drf, but has better things to do than to write one.

机译：建模语言现象需要高度结构化和复杂的数据表示。文档表示框架（DRF）提供了一个接口以在文档中存储和检索多个注释图层。研究人员面临着一个艰难的选择：使用重量级的DRF或实施定制DRF。成本很大，要么是学习一个新的复杂系统，要么是持续向房屋生长的系统添加功能，其风险超越其原始范围。我们介绍DOCREP，轻量级和高效的DRF，并将其与现有的DRF进行比较。我们讨论了C ++，Python和Java的设计目标和实现。我们使用DOCREP和UIMA转换Ontonotes 5语料库，提供定量比较，以及讨论建模权衡。我们得出结论，从使用DOCREP作为自己的项目的研究人员的定性反馈。最终，我们希望Docrep对想要DRF的好处的繁忙的研究人员有用，但有更好的事情要做。

著录项

来源
《International conference on computational linguistics》|2014年||共10页
会议地点
作者
Tim Dawborn; James R. Curran;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. A document representation framework with interpretable features using pre-trained word embeddings [J] . Narendra Babu Unnam, P. Krishna Reddy International Journal of Data Science and Analytics . 2020,第1期

机译：使用预先训练的Word Embeddings具有可解释功能的文档表示框架
2. The Locally Weighted Bag of Words Framework for Document Representation [J] . Lebanon Guy, Mao Yi, Dillon Joshua Journal of machine learning research . 2007,第Oct期

机译：用于文档表示的局部加权Words框架
3. An Efficient Type Codec for Point Data in Lightweight Applications Scene Representation (LASeR) [J] . YeSun Joung, Jihun Cha Won-Sik Cheong, Young-kwon Lim, ETRI journal . 2005,第6期

机译：轻量级应用场景表示（LASeR）中用于点数据的高效类型编解码器
4. Docrep: A lightweight and efficient document representation framework [C] . Tim Dawborn, James R. Curran International conference on computational linguistics . 2014

机译：Docrep：轻巧高效的文档表示框架
5. Incorporating semantic and syntactic information into document representation for document clustering. [D] . Wang, Yong. 2005

机译：将语义和句法信息合并到文档表示中以进行文档聚类。
6. Predict Alzheimer’s disease using hippocampus MRI data: a lightweight 3D deep convolutional network model with visual and global shape representations [O] . Sreevani Katabathula, Qinyong Wang, Rong Xu 2021

机译：使用海马MRI数据预测阿尔茨海默病：具有视觉和全球形状表示的轻量级3D深度卷积网络模型
7. A framework for obtaining structurally complex condensed representations of document sets in the biomedical domain [O] . Berlanga Llavori Rafael, Ramírez Cruz Yunior, Gil García Reynaldo 2012

机译：用于在生物医学领域中获得文档集的结构复杂的浓缩表示的框架

Docrep: A lightweight and efficient document representation framework

摘要

著录项

相似文献

相关主题

期刊订阅