...
首页> 外文期刊>BioMed research international >Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection
【24h】

Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection

机译:使用基于上下文相似性的特征选择改善蛋白质交互制品的分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the Fl measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.
机译:蛋白质相互作用物品分类是生物结构域中的文本分类任务,以确定哪些物品描述蛋白质蛋白质相互作用。由于文本分类中的特征空间是高维的,因此广泛用于降低特征的维度来加速计算,而不会牺牲分类性能。许多现有特征选择方法基于文档频率和术语频率的统计测量。这些方法的一个潜在缺点是它们分别治疗特征。因此,首先,我们在上下文信息之间设计相似性测量,以考虑特征的单词Cooccurrences和短语块。然后,我们将上下文信息的相似性介绍到特征的重要性测量,以替换文档和术语频率。因此,我们提出了基于新的上下文相似性的特征选择方法。它们的性能是在两种蛋白质相互作用物品收集中进行评估,并与基于频率的方法进行比较。实验结果表明,基于上下文相似性的方法在FL测量和尺寸减少率方面表现更好。从围绕特征的上下文信息受益,所提出的方法可以有效地选择蛋白质相互作用物品分类的独特特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号