首页> 外文会议>Information Retrieval Technology >Enhancing Biomedical Named Entity Classification Using Terabyte Unlabeled Data

【24h】

Enhancing Biomedical Named Entity Classification Using Terabyte Unlabeled Data

机译：使用TB的未标记数据增强生物医学命名实体的分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a semi-supervised learning method to enhance biomedical named entity classification using features generated from labeled and terabyte unlabeled data, called Feature Coupling Degree (FCD) features. Highly discriminative context words are obtained from labeled free text using Chi-square method and queries formed by combining the named entity and context words are retrieved by search engine. Then the retrieved web page counts are converted into binary features by discretization. We investigate the effect of this type of feature in a biomedical corpus generated from several online resources. Support Vector Machine (SVM) is used as classifier and the performances of different features with various kernels and discretization methods are compared. The results show that the method enhances the classification performance especially for Out-of-Vocabulary (OOV) terms and relative small size of training data. In addition, only using FCD features with polynomial kernels, the performance is competitive to classical features.

机译：本文提出了一种半监督学习方法，该方法利用从标记和TB级未标记数据生成的特征（称为特征耦合度（FCD）特征）来增强生物医学命名实体分类。使用卡方方法从标记的自由文本中获得具有高度区分性的上下文词，并且通过组合命名实体和上下文词构成的查询由搜索引擎检索。然后，通过离散化将检索到的网页计数转换为二进制特征。我们调查了这种功能在从几个在线资源生成的生物医学语料库中的作用。支持向量机（SVM）被用作分类器，并比较了具有各种内核和离散化方法的不同功能的性能。结果表明，该方法增强了分类性能，尤其是针对词汇量（OOV）术语和相对较小的训练数据而言。此外，仅将FCD特征与多项式内核一起使用，其性能才能与经典特征相比。

著录项

来源
《Information Retrieval Technology》|2008年|P.605-612|共8页
会议地点
作者
Yanpeng Li; Hongfei Lin; Zhihao Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机设备安全;
关键词
semi-supervised learning; biomedical named entity; classification; discretization; SVM; polynomial kernel;

机译：半监督学习;生物医学命名实体;分类;离散化;支持向量机;多项式核;

相似文献

外文文献
中文文献
专利

1. Named Entity Recognition Using Appropriate Unlabeled Data, Post-processing and Voting? [J] . A. Ekbal, S. Bandyopadhyay Informatica: An International Journal of Computing and Informatics . 2010,第1期

机译：使用适当的未标记数据，后处理和表决来命名实体识别？
2. Named entity recognition and classification in biomedical text using classifier ensemble [J] . Saha Sriparna, Ekbal Asif, Sikdar Utpal Kumar International journal of data mining and bioinformatics . 2015,第4期

机译：使用分类器集成在生物医学文本中命名实体识别和分类
3. Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes [J] . Huiwei Zhou, Shixian Ning, Zhe Liu, BMC Bioinformatics . 2020,第1期

机译：知识增强的生物医学命名实体识别和归一化：施用蛋白质和基因
4. Enhancing Biomedical Named Entity Classification Using Terabyte Unlabeled Data [C] . Yanpeng Li, Hongfei Lin, Zhihao Yang Asia Information Retrieval Symposium . 2008

机译：使用Terabyte未标记的数据增强生物医学命名实体分类
5. Improving named entity recognition with co-training and unlabeled bilingual data. [D] . Ma, Xiaoyi. 2008

机译：通过共同训练和未标记的双语数据来改善命名实体的识别能力。
6. Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes [O] . Huiwei Zhou, Shixian Ning, Zhe Liu, 2020

机译：知识增强的生物医学命名实体识别和标准化：在蛋白质和基因中的应用
7. UniTrans : Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data [O] . Qianhui Wu, Zijia Lin, Börje F. Karlsson, 2020

机译：Unitrans：使用未标记数据的交叉命名实体识别的统一模型传输和数据传输

Enhancing Biomedical Named Entity Classification Using Terabyte Unlabeled Data

摘要

著录项

相似文献

相关主题

期刊订阅