首页> 外文会议>Data integration in the life sciences >The Cinderella of Biological Data Integration:Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources

【24h】

The Cinderella of Biological Data Integration:Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources

机译：生物数据集成的灰姑娘：应对来自专利来源的实体和关系挖掘的一些挑战

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the global corpus of medicinal chemistry data is only published in patents. However, extracting this from patent documents and subsequent integration with literature and database sources poses unique challenges. This work presents the investigation of an extensive full-text patent resource, including automated name-to-chemical structure conversion, licensed by AstraZeneca via a consortium arrangement with IBM. Our initial focus was identifying protein targets in patent titles linked to extracted bioactive compounds. We benchmarked target recognition strategies against target-assay-compound relationships manually curated from patents by GVKBIO. By analysis of word frequencies and protein names we assessed the false-negative problem of targets not specified in titles and false-positives from non-target proteins in titles. We also examined the time-signals for selected target and non-target names by year of patent publication. Our results exemplify problems and some solutions for extracting data from this source.

机译：全球大多数药物化学数据集仅在专利中公开。然而，从专利文献中提取该信息并随后与文献和数据库资源整合会带来独特的挑战。这项工作提出了对广泛的全文专利资源的调查，包括由AstraZeneca通过与IBM达成的财团协议许可的自动名称到化学结构转换。我们最初的重点是在与提取的生物活性化合物相关的专利标题中确定蛋白质靶标。我们将目标识别策略与GVKBIO从专利手动策划的目标测定-化合物关系进行了基准测试。通过分析单词频率和蛋白质名称，我们评估了标题中未指定的靶标的假阴性问题和标题中非靶标蛋白的假阳性结果。我们还按专利发布年份检查了选定目标名称和非目标名称的时间信号。我们的结果例证了从该来源提取数据的问题和一些解决方案。

著录项

来源
《Data integration in the life sciences》|2010年|p.106-121|共16页
会议地点 Gothenburg(SE);Gothenburg(SE)
作者
Ithipol Suriyawongkul; Christopher Southan; Sorel Muresan;
展开▼
作者单位

Chalmers University of Technology, Gothenburg, Sweden,Computational Chemistry Section, Global Compound Sciences, DECS,AstraZeneca RD, Molndal, Sweden;

Computational Chemistry Section, Global Compound Sciences, DECS,AstraZeneca RD, Molndal, Sweden;

Computational Chemistry Section, Global Compound Sciences, DECS,AstraZeneca RD, Molndal, Sweden;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术）;
关键词
biomedical text mining; patent information.;

机译：生物医学文本挖掘；专利信息。;

相似文献

外文文献
中文文献
专利

1. Challenges in Integrating Biological Data Sources [J] . S. B. Davidson, C. Overton, P. Buneman Journal of computational biology: A journal of computational molecular cell biology . 1995,第4期

机译：整合生物数据源的挑战
2. Challenges in Integrating Biological Data Sources [J] . S. B. Davidson, C. Overton, P. Buneman Journal of computational biology: A journal of computational molecular cell biology . 1995,第4期

机译：整合生物数据源的挑战
3. Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data [J] . Yuji Zhang, Jianhua Xuan, Benildo G de los Reyes, BMC Bioinformatics . 2008,第1期

机译：通过整合多源生物数据，基于网络基序的转录因子-靶基因关系识别
4. The Cinderella of Biological Data Integration: Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources [C] . Ithipol Suriyawongkul, Christopher Southan, Sorel Muresan International Conference on Data Integration in the Life Sciences . 2010

机译：生物数据集成的灰姑娘：解决专利来源的实体和关系挖掘的一些挑战
5. The Problem of Time: Addressing Challenges in Spatio-Temporal Data Integration [D] . Robison, Nicholas. 2018

机译：时间问题：应对时空数据集成中的挑战
6. Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data [O] . Yuji Zhang, Jianhua Xuan, Benildo G de los Reyes, 2008

机译：通过整合多源生物数据基于网络基序的转录因子-靶基因关系识别
7. Challenges in Integrating Biological Data Sources [O] . Davidson, Susan, Overton, Chris, Buneman, Peter 1995

机译：整合生物数据来源面临的挑战
8. United States Patent and Trademark Office: USPTO Needs Strong Office of Human Resources Management Capable of Addressing Current and Future Challenges. Report No. BTD-16432-4-0001 [R] . 2004

机译：美国专利商标局：UspTO需要强大的人力资源管理办公室，能够应对当前和未来的挑战。报告编号BTD-16432-4-0001

The Cinderella of Biological Data Integration:Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources

摘要

著录项

相似文献

相关主题

期刊订阅