首页> 外文学位 >Integrating information retrieval, summarization, natural language processing, and user interfacing for phenotype-genotype association.
【24h】

Integrating information retrieval, summarization, natural language processing, and user interfacing for phenotype-genotype association.

机译:集成信息检索,摘要,自然语言处理和用户接口以实现表型与基因型关联。

获取原文
获取原文并翻译 | 示例

摘要

Due to the large number of articles published in the biomedical domain, researchers often spend a lot of time finding information relevant to their research. For this dissertation, we have developed and integrated three applications that extract relevant information from text -- BioRover, FigureSearch and SimGenes. BioRover takes a gene or disease name as input and generated a list of related disease or genes. Related genes and diseases are extracted by identifying gene and disease named entities from over 600 million sentences. A dictionary-based disease name tagger was developed to identify disease entities and gene names were identified using a machine-learning based tagger. BioRover also allows users to filter results based on sentence modalities negation and speculation, and genetic and epigenetic factors such as mutation, methylation, and phosphorylation. Negation and speculation are identified using a machine learning-based tagger. FigureSearch is a search engine for published biomedical figures. FigureSearch indexes over 16 million figures from over 5 million full-text articles. For each figure, a four-sentence summary is generated by extracting sentences from the full-text of the article. The summary contains one sentence classified as Introduction, one classified as Methods, one classified as Results and one classified as Discussion. A machine learning-based classifier was used to classify sentences. SimGenes identifies semantically similar genes for a reference gene.;The Gene Ontology annotations are used to calculate semantic similarity between genes. A web application was developed to allow users to use these systems. We believe that these applications will help access relevant information quickly and using SimGenes, researchers can generate new hypothesis. All three systems are available online as free web-applications -- BioRover -- http://biorover.askhermes.org/, FigureSearch -- http://figuresearch.askhermes.org and SimGenes -- http://simgenes.askhermes.org.
机译:由于在生物医学领域发表了大量文章,因此研究人员经常花费大量时间查找与他们的研究相关的信息。在本文中,我们开发并集成了三个应用程序,它们从文本中提取相关信息-BioRover,FigureSearch和SimGenes。 BioRover将基因或疾病名称作为输入,并生成了相关疾病或基因的列表。通过识别超过6亿个句子的基因和疾病命名实体来提取相关的基因和疾病。开发了基于字典的疾病名称标记器,以识别疾病实体,并使用基于机器学习的标记器识别基因名称。 BioRover还允许用户根据句子形式的否定和推测以及诸如突变,甲基化和磷酸化等遗传和表观遗传因素来过滤结果。否定和推测是使用基于机器学习的标记器识别的。 FigureSearch是搜索已发布生物医学数字的引擎。 FigureSearch从500万以上的全文文章中索引了1600万个图。对于每个图形,通过从文章全文中提取句子来生成四句摘要。摘要包含一句话,分类为简介,一句话分类为方法,一句话分类为结果,一句话分类为讨论。基于机器学习的分类器用于对句子进行分类。 SimGenes为参考基因识别出语义相似的基因。GeneOntology注释用于计算基因之间的语义相似性。开发了一个Web应用程序以允许用户使用这些系统。我们相信这些应用程序将帮助快速访问相关信息,并且使用SimGenes,研究人员可以产生新的假设。所有这三个系统都可以作为免费的Web应用程序在线获得-BioRover-http://biorover.askhermes.org/、FigureSearch-http://figuresearch.askhermes.org和SimGenes-http://simgenes.askhermes .org。

著录项

  • 作者

    Agarwal, Shashank.;

  • 作者单位

    The University of Wisconsin - Milwaukee.;

  • 授予单位 The University of Wisconsin - Milwaukee.;
  • 学科 Engineering Biomedical.;Computer Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 175 p.
  • 总页数 175
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号