基于词频统计的蛋白质交互关系识别

蔡松成; 牛耘

首页> 中文期刊>计算机技术与发展 >基于词频统计的蛋白质交互关系识别

基于词频统计的蛋白质交互关系识别

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

目前, 基于远监督的蛋白质交互关系抽取方法通过将知识库中的实体对与文本中的实体进行匹配来产生大规模的训练数据, 有效地解决了标注数据不足的问题.在基于最大期望算法的蛋白质交互识别的基础上, 提出了一种基于词频统计的蛋白质交互关系识别.该方法对每一个蛋白质对签名档进行处理, 取出两个目标蛋白质中间的单词;然后对其进行词性标注, 只保留名词和动词, 同时进行词干提取;最终得到每个蛋白质对签名档下的词频统计.利用得到的词频信息设定阈值来获取签名档的高频词, 改进最大期望算法的初始化过程.实验结果表明, 通过加入高频词信息的干预来进一步获取句子的类别作为初始值较原始的基于最大期望算法的模型, 取得了更高且均衡的精确度和召回率, 对目前基于远监督的蛋白质交互关系识别方法进行了明显的改进.%Current protein-protein interaction (PPI) extraction approach based on distant supervision gathers large scales of training data by aligning entity pairs in knowledge base with entities in text, which solves the problem of lack of annotation data effectively.In this paper, based on the protein interaction recognition using the expectation maximization algorithm, we propose a novel method of word frequency count, which processes the signature of each protein pair and obtains the unigram words between two target proteins.Then, the data which is obtained by the first step should be processed with POS tagging and stem extraction, o nly the nouns and verbs saved.Finally, we can obtain the word frequency statistics for signatures of protein pairs.High frequency words are produced by setting the threshold for the word frequency statistics, which can be used to improve the initialization step of the expectation maximization algorithm.The experiment shows that the high and well balanced precision and recall are achieved by further integrating the high-frequency word information to obtain the sentence category as the initial model based on the maximum expectation algorithm, which shows significant improvement in comparison to current PPI based on distant supervision.

著录项

来源
《计算机技术与发展》|2019年第2期|65-6872|共5页
作者
蔡松成; 牛耘;
展开▼
作者单位

南京航空航天大学计算机科学与技术学院, 江苏南京 211106;

南京航空航天大学计算机科学与技术学院, 江苏南京 211106;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
远监督; 蛋白质交互; 最大期望算法; 词频统计;
入库时间 2023-07-24 21:47:18

相似文献

中文文献
外文文献
专利

1. 基于关键词的蛋白质交互关系识别 [J] . 毛宇薇 ,牛耘 . 计算机技术与发展 . 2019,第003期
2. 基于分布式假设的弱监督蛋白质交互关系识别 [J] . 毛宇薇 ,牛耘 . 计算机技术与发展 . 2018,第009期
3. 基于最大期望算法的蛋白质交互关系识别 [J] . 蔡松成 ,牛耘 . 计算机技术与发展 . 2018,第008期
4. 基于迁移学习的蛋白质交互关系抽取 [J] . 李丽双 ,郭瑞 ,黄德根 . 中文信息学报 . 2016,第002期
5. 基于组合核的蛋白质交互关系抽取 [J] . 李丽双 ,刘洋 ,黄德根 . 中文信息学报 . 2013,第001期
6. 基于词频统计分析国内外文本挖掘的研究热点 [C] . 潘若愚 ,姚浩浩 ,朱克毓 . 第十二届（2017）中国管理学年会 . 2017
7. 基于上下文语义相似性约束的蛋白质交互关系识别 [A] . 吴红梅 . 2016

基于词频统计的蛋白质交互关系识别

摘要

著录项

相似文献

相关主题

期刊订阅