首页> 外文会议>International Conference on Signal-Image Technology and Internet- Based Systems >A Hierarchical n-Grams Extraction Approach for Classification Problem
【24h】

A Hierarchical n-Grams Extraction Approach for Classification Problem

机译:分类问题的分层n-grams提取方法

获取原文

摘要

We are interested in protein classification based on their primary structures. The goal is to automatically classify proteins sequences according to their families. This task goes through the extraction of a set of descriptors that we present to the supervised learning algorithms. There are many types of descriptors used in the literature. The most popular one is the n-gram. It corresponds to a series of characters of n-length. The standard approach of the n-grams consists in setting first the parameter n, extracting the corresponding n-grams descriptors, and in working with this value during the whole data mining process. In this paper, we propose an hierarchical approach to the n-grams construction. The goal is to obtain descriptors of varying length for a better characterization of the protein families. This approach tries to answer to the domain knowledge of the biologists. The patterns, which characterize the proteins' family, have most of the time a various length. Our idea is to transpose the frequent itemsets extraction principle, mainly used for the association rule mining, in the n-grams extraction for protein classification context. The experimentation shows that the new approach is consistent with the biological reality and has the same accuracy of the standard approach.
机译:我们对基于其主要结构的蛋白质分类感兴趣。目标是根据其家庭自动对蛋白质序列进行分类。此任务通过提取我们向监督学习算法呈现的一组描述符的提取。文献中使用了许多类型的描述符。最受欢迎的是n-gram。它对应于n长的一系列字符。 n-gram的标准方法在于设置首先参数n,提取相应的n-gram描述符,并在整个数据挖掘过程中使用此值。在本文中,我们提出了一种分层方法来实现N-GRAMS构造。目标是获得变化长度的描述符,以便更好地表征蛋白质。这种方法试图回答生物学家的域名知识。表征蛋白质家族的模式,大部分时间都有各种长度。我们的想法是转移频繁的项目集提取原理,主要用于关联规则挖掘,在N-Grams提取蛋白质分类背景中。实验表明,新方法与生物现实一致,具有相同的标准方法准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号