Integrating high dimensional bi-directional parsing models for gene mention tagging.

Hsu CN; Chang YM; Kuo CJ; Lin YS; Huang HS; Chung IF

首页> 外文期刊>Bioinformatics >Integrating high dimensional bi-directional parsing models for gene mention tagging.

【24h】

Integrating high dimensional bi-directional parsing models for gene mention tagging.

机译：集成用于基因提及标记的高维双向解析模型。

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Tagging gene and gene product mentions in scientific text is an important initial step of literature mining. In this article, we describe in detail our gene mention tagger participated in BioCreative 2 challenge and analyze what contributes to its good performance. Our tagger is based on the conditional random fields model (CRF), the most prevailing method for the gene mention tagging task in BioCreative 2. Our tagger is interesting because it accomplished the highest F-scores among CRF-based methods and second over all. Moreover, we obtained our results by mostly applying open source packages, making it easy to duplicate our results. RESULTS: We first describe in detail how we developed our CRF-based tagger. We designed a very high dimensional feature set that includes most of information that may be relevant. We trained bi-directional CRF models with the same set of features, one applies forward parsing and the other backward, and integrated two models based on the output scores and dictionary filtering. One of the most prominent factors that contributes to the good performance of our tagger is the integration of an additional backward parsing model. However, from the definition of CRF, it appears that a CRF model is symmetric and bi-directional parsing models will produce the same results. We show that due to different feature settings, a CRF model can be asymmetric and the feature setting for our tagger in BioCreative 2 not only produces different results but also gives backward parsing models slight but constant advantage over forward parsing model. To fully explore the potential of integrating bi-directional parsing models, we applied different asymmetric feature settings to generate many bi-directional parsing models and integrate them based on the output scores. Experimental results show that this integrated model can achieve even higher F-score solely based on the training corpus for gene mention tagging. AVAILABILITY: Data sets, programs and an on-line service of our gene mention tagger can be accessed at http://aiia.iis.sinica.edu.tw/biocreative2.htm.

机译：在科学文献中标记基因和基因产物标记是文献挖掘的重要的第一步。在本文中，我们详细描述了我们的基因提及标记器参加了BioCreative 2挑战赛，并分析了有助于其良好表现的因素。我们的标记器基于条件随机字段模型（CRF），这是BioCreative 2中最常见的基因提及标记任务的方法。我们的标记器很有趣，因为它在基于CRF的方法中获得了最高的F得分，并且排名第二。此外，我们主要通过使用开源软件包来获得我们的结果，从而很容易复制我们的结果。结果：我们首先详细描述我们如何开发基于CRF的标记器。我们设计了一个非常高维的功能集，其中包含了大多数可能相关的信息。我们训练了具有相同功能集的双向CRF模型，一个应用了前向解析，另一个应用了后向解析，并基于输出得分和字典过滤集成了两个模型。促成我们标记器良好性能的最重要因素之一是集成了附加的反向解析模型。但是，从CRF的定义看来，CRF模型是对称的，双向解析模型将产生相同的结果。我们显示，由于功能设置不同，CRF模型可能是不对称的，而且BioCreative 2中标记器的功能设置不仅会产生不同的结果，而且与后向解析模型相比，后向解析模型具有轻微但持续的优势。为了充分挖掘集成双向解析模型的潜力，我们应用了不同的非对称特征设置来生成许多双向解析模型，并根据输出得分对其进行集成。实验结果表明，仅基于训练语料库的基因提及标记，该集成模型就可以获得更高的F分数。可用性：可以在http://aiia.iis.sinica.edu.tw/biocreative2.htm上访问我们的基因提及标记器的数据集，程序和在线服务。

著录项

来源
《Bioinformatics》 |2008年第13期|共9页
作者
Hsu CN; Chang YM; Kuo CJ; Lin YS; Huang HS; Chung IF;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物科学;生物工程学（生物技术）;
关键词

相似文献

外文文献
中文文献
专利

1. Integrating high dimensional bi-directional parsing models for gene mention tagging. [J] . Hsu CN, Chang YM, Kuo CJ, Bioinformatics . 2008,第13期

机译：集成用于基因提及标记的高维双向解析模型。
2. Method of Generalized Cole-Hopf Substitutions for Dimension 1+2 and Integrable Models for Two-Dimensional Compressible Flows [J] . Zhuravlev VM, Zinovev DA JETP Letters . 2008,第3期

机译：1 + 2维广义Cole-Hopf替换方法和二维可压缩流的可积模型
3. General finite-size effects for zero-entropy states in one-dimensional quantum integrable models [J] . Eliens Sebas, Caux Jean-Sebastien Journal of physics, A. Mathematical and theoretical . 2016,第49期

机译：一维量子可积模型中零熵态的一般有限大小效应
4. Integrating Divergent Models for Gene Mention Tagging [C] . Lishuang LI, Rongpeng ZHOU, Degen HUANG, Proceedings of international conference on natural language processing and knowledge engineering . 2009

机译：整合发散模型以进行基因提及标记
5. Compact physical models for power supply noise and chip/package co-design in gigascale integration (GSI) and three-dimensional integration systems. [D] . Huang, Gang. 2008

机译：千兆位集成（GSI）和三维集成系统中用于电源噪声和芯片/封装协同设计的紧凑型物理模型。
6. Integrating high dimensional bi-directional parsing models for gene mention tagging [O] . Chun-Nan Hsu, Yu-Ming Chang, Cheng-Ju Kuo, -1

机译：集成高维双向解析模型以进行基因提及标记
7. Integrating High Dimensional Bi-directional Parsing Models for Gene Mention Tagging [O] . Chun-nan Hsu, Yu-ming Chang, Cheng-ju Kuo, 2013

机译：整合高维双向分析模型用于基因提取标记
8. Two-Dimensional Exactly and Completely Integrable Dynamic Systems (Monopoles, Instantons, Dual Models, Relativistic Strings, Lund-Regge Model, Generalized Toda Lattice, Etc) [R] . Leznov, A. N., Saveliev, M. V. 1982

机译：二维完全可完全积分的动态系统（单极，瞬时，双模型，相对论串，Lund-Regge模型，广义Toda格子等）

Integrating high dimensional bi-directional parsing models for gene mention tagging.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅