首页> 外文会议>International conference on computational linguistics >Docrep: A lightweight and efficient document representation framework
【24h】

Docrep: A lightweight and efficient document representation framework

机译:DOCREP:轻量级和高效的文档表示框架

获取原文

摘要

Modelling linguistic phenomena requires highly structured and complex data representations. Document representation frameworks (drfs) provide an interface to store and retrieve multiple annotation layers over a document. Researchers face a difficult choice: using a heavy-weight DRF or implement a custom drf. The cost is substantial, either learning a new complex system, or continually adding features to a home-grown system that risks overrunning its original scope. We introduce docrep, a lightweight and efficient drf, and compare it against existing drfs. We discuss our design goals and implementations in C++, Python, and Java. We transform the OntoNotes 5 corpus using docrep and uima, providing a quantitative comparison, as well as discussing modelling trade-offs. We conclude with qualitative feedback from researchers who have used docrep for their own projects. Ultimately, we hope docrep is useful for the busy researcher who wants the benefits of a drf, but has better things to do than to write one.
机译:建模语言现象需要高度结构化和复杂的数据表示。文档表示框架(DRF)提供了一个接口以在文档中存储和检索多个注释图层。研究人员面临着一个艰难的选择:使用重量级的DRF或实施定制DRF。成本很大,要么是学习一个新的复杂系统,要么是持续向房屋生长的系统添加功能,其风险超越其原始范围。我们介绍DOCREP,轻量级和高效的DRF,并将其与现有的DRF进行比较。我们讨论了C ++,Python和Java的设计目标和实现。我们使用DOCREP和UIMA转换Ontonotes 5语料库,提供定量比较,以及讨论建模权衡。我们得出结论,从使用DOCREP作为自己的项目的研究人员的定性反馈。最终,我们希望Docrep对想要DRF的好处的繁忙的研究人员有用,但有更好的事情要做。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号