首页> 外文期刊>Nucleic acids research >TeamTat: a collaborative text annotation tool
【24h】

TeamTat: a collaborative text annotation tool

机译:Teamtat:协作文本注释工具

获取原文
       

摘要

Manually annotated data is key to developing text-mining and information-extraction algorithms. However, human annotation requires considerable time, effort and expertise. Given the rapid growth of biomedical literature, it is paramount to build tools that facilitate speed and maintain expert quality. While existing text annotation tools may provide user-friendly interfaces to domain experts, limited support is available for figure display, project management, and multi-user team annotation. In response, we developed TeamTat (https://www.teamtat.org), a web-based annotation tool (local setup available), equipped to manage team annotation projects engagingly and efficiently. TeamTat is a novel tool for managing multi-user, multi-label document annotation, reflecting the entire production life cycle. Project managers can specify annotation schema for entities and relations and select annotator(s) and distribute documents anonymously to prevent bias. Document input format can be plain text, PDF or BioC?(uploaded locally or automatically retrieved from PubMed/PMC), and output format is BioC with inline annotations. TeamTat displays figures from the full text for the annotator's convenience. Multiple users can work on the same document independently in their workspaces, and the team manager can track task completion. TeamTat provides corpus quality assessment via inter-annotator agreement statistics, and a user-friendly interface convenient for annotation review and inter-annotator disagreement resolution to improve corpus quality.
机译:手动注释的数据是开发文本挖掘和信息提取算法的关键。但是,人类注释需要相当多的时间,努力和专业知识。鉴于生物医学文献的快速增长,建立促进速度和维护专业质量的工具至关重要。虽然现有文本注释工具可以向域专家提供用户友好的接口,但有限的支持可用于数字显示,项目管理和多用户团队注释。作为回应,我们开发了Teamtat(https://www.teamtat.org),这是一个基于Web的注释工具(可用本地设置),配备用于管理团队注释项目,并有效地管理团队注释项目。 Teamtat是一种用于管理多用户,多标签文档注释的新型工具,反映了整个生产生命周期。项目经理可以为实体和关系指定注释架构,并匿名选择annotator并分发文档以防止偏见。文档输入格式可以是纯文本,PDF或BIOC?(从PubMed / PMC本地或自动检索),输出格式是具有内联注释的BioC。 Teamtat从完整的文本中显示了注释者方便的数字。多个用户可以在其工作区独立上独立地在同一文档上工作,而Team Manager可以跟踪任务完成。 Teamtat通过互联网协议统计数据提供了语料库质量评估,以及用户友好的界面,方便了用于注释审查和Inter-Inter-ander分歧解决,以提高语料库质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号