首页> 外文期刊>Computational intelligence and neuroscience >An Intelligent Classification System for Cancer Detection Based on DNA Methylation Using ML and Semantic Knowledge in Healthcare
【24h】

An Intelligent Classification System for Cancer Detection Based on DNA Methylation Using ML and Semantic Knowledge in Healthcare

机译:基于ML和语义知识的医疗领域基于DNA甲基化的癌症检测智能分类系统

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

To consistently assess a patient's internal and external wellness and diagnose chronic conditions like cancer, Alzheimer's disease, and cardiovascular disease, wearable sensing devices are being used. Wearable technologies and networking websites have become incredibly common in the medical sector in recent times. The condition of a patient's health can be influenced by a number of factors, including psychological response, emotional stability, and anxiety levels, which can be evaluated using social network analysis based on graph theory-based techniques and these ideas, known as “social network analysis” (SNA) are used to study relationship phenomena. Therefore, numerous uses for SNA in health research are possible, ranging from social science to exact science. For example, it can be used to research cooperative networks of healthcare providers and hazard-prone behaviors, infectious disease transmission, and the spread of initiatives for health promotion and prevention. Recently, a number of machine learning-based healthcare solutions have been proposed to track chronic illnesses utilizing data from social networks and wearable monitoring devices. In our suggested approach, we are using an intelligent system with the assistance of wearable sensors for the classification of cancer based on DNA methylation, an important epigenetic process in the human genome that controls gene expression and has been connected to a number of health issues. A mixed-sampling imbalanced data ensemble classification technique is created with the help of biomedical sensors to address the problem of class imbalance and high dimensionality in the Cancer Genome Atlas (TCGA) massive data. This technique is based on the Intelligent Synthetic Minority Oversampling (SMOTE) algorithm. The false-negative rate significantly rises as a result of this, to give a larger data set, a new minority class sample will be first obtained. The noise created during the sample expansion process is actually any data that has been acquired, preserved, or altered in a way that prevents the system that initially conceived it from accessing or utilizing it. Noisy data boosts the amount of space needed excessively and can also drastically influence the findings of any data collection investigation and therefore can also affect the sample sets of one or the other class, resulting in the class imbalance which acts as a common problem in ML datasets. The Tomek Link method is then used to eliminate this noise, producing a reasonably balanced data set. Each layer selects two random forest structures using the cascading forest structure of the deep forest (GC-Forest) algorithm to increase the generalization ability of the model and create the final classification model. Experiments using DNA methylation data collected by employing biosensors from six tumor patients reveal that the mixed-sampling unbalanced data ensemble classification technique may increase the sensitivity to the minority class while maintaining the majority class's classification accuracy.
机译:为了始终如一地评估患者的内部和外部健康状况并诊断癌症、阿尔茨海默病和心血管疾病等慢性病,人们正在使用可穿戴传感设备。近年来,可穿戴技术和网络网站在医疗领域变得非常普遍。患者的健康状况可能受到许多因素的影响,包括心理反应、情绪稳定性和焦虑水平,这些因素可以使用基于图论技术的社交网络分析进行评估,这些想法被称为“社交网络分析”(SNA)用于研究关系现象。因此,SNA在健康研究中的多种用途是可能的,从社会科学到精确科学。例如,它可用于研究医疗保健提供者的合作网络和易患行为、传染病传播以及健康促进和预防举措的传播。最近,已经提出了许多基于机器学习的医疗保健解决方案,以利用来自社交网络和可穿戴监控设备的数据来跟踪慢性疾病。在我们建议的方法中,我们正在使用一种智能系统,在可穿戴传感器的帮助下,根据DNA甲基化对癌症进行分类,DNA甲基化是人类基因组中重要的表观遗传过程,控制基因表达,并与许多健康问题有关。针对癌症基因组图谱(TCGA)海量数据中的类不平衡和高维问题,借助生物医学传感器,建立了一种混合采样不平衡数据集成分类技术。该技术基于智能合成少数过采样 (SMOTE) 算法。因此,假阴性率显着上升,为了提供更大的数据集,将首先获得一个新的少数类样本。在样本扩增过程中产生的噪声实际上是以阻止最初构思它的系统访问或利用它的方式获取、保存或更改的任何数据。嘈杂的数据会过度增加所需的空间量,还会极大地影响任何数据收集调查的结果,因此也会影响一个或另一个类别的样本集,从而导致类别不平衡,这是 ML 数据集中的常见问题。然后使用Tomek Link方法消除这种噪声,从而产生合理平衡的数据集。每层使用深度森林的级联森林结构(GC-Forest)算法选择两个随机森林结构,以增加模型的泛化能力并创建最终的分类模型。利用生物传感器收集的6名肿瘤患者的DNA甲基化数据的实验表明,混合采样不平衡数据集成分类技术可以提高对少数类别的敏感性,同时保持多数类别的分类准确性。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号