首页> 外文期刊>BMC Bioinformatics >QTLTableMiner ++ : semantic mining of QTL tables in scientific articles
【24h】

QTLTableMiner ++ : semantic mining of QTL tables in scientific articles

机译:QTLTableMiner ++:科学文章中QTL表的语义挖掘

获取原文
           

摘要

A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.
机译:数量性状基因座(QTL)是与表型相关的基因组区域。有关QTL映射研究的大多数实验信息在科学出版物的表格中都有介绍。传统的文本挖掘技术旨在从非结构化文本而非表中提取信息。我们介绍了QTLTableMiner ++(QTM),这是一种表挖掘工具,可以提取并在语义上注释植物科学文献(异构)表中埋藏的QTL信息。 QTM是用Java编程语言编写的命令行工具。该工具将来自欧洲PMC存储库的科学文章作为输入,使用关键字匹配和基于本体的概念识别来提取QTL表。使用从表格属性(例如标题,列标题和表格页脚)派生的规则对表格进行进一步规范化。此外,基于列标题和单元格条目的数据类型,表列分为三类,即列描述符,属性和值。使用Schwartz和Hearst算法扩展了表中的缩写。最后,使用Apache Solr搜索平台在语义上丰富了QTL表的内容,并添加了特定领域的本体(例如作物本体,植物本体和特质本体),并将结果存储在关系数据库和文本文件中。根据从两个手动注释的开放获取文章语料库中检索到的信息,即通过对番茄(Solanum lycopersicum)和马铃薯(S. tuberosum)的QTL作图研究,通过精确度和召回率对QTM工具的性能进行了评估。总而言之,QTM在番茄中检出的QTL陈述的准确性为74.53%,召回率为92.56%,在马铃薯中检出的QTL陈述为82.82%,召回率为98.94%。 QTM是一种独特的工具,有助于以机器可读和语义上可互操作的格式提供QTL信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号