SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

Mukherjee Mimi; Khushi Matloob

首页> 外文期刊>Applied System Innovation >SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

【24h】

SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

机译：SMOTE-ENC：一种基于微妙的粉碎方法，用于生成名义和连续特征的合成数据

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Real-world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these underrepresented instances. To solve this problem, many variations of synthetic minority oversampling methods (SMOTE) have been proposed to balance datasets which deal with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based oversampling technique to balance the data. In this paper, we present a novel minority oversampling method, SMOTE-ENC (SMOTE—Encoded Nominal and Continuous), in which nominal features are encoded as numeric values and the difference between two such numeric values reflects the amount of change of association with the minority class. Our experiments show that classification models using the SMOTE-ENC method offer better prediction than models using SMOTE-NC when the dataset has a substantial number of nominal features and also when there is some association between the categorical features and the target class. Additionally, our proposed method addressed one of the major limitations of the SMOTE-NC algorithm. SMOTE-NC can be applied only on mixed datasets that have features consisting of both continuous and nominal features and cannot function if all the features of the dataset are nominal. Our novel method has been generalized to be applied to both mixed datasets and nominal-only datasets.

机译：真实世界的数据集是严重倾斜，其中一些类显著由其他类寡不敌众。在这些情况下，机器学习算法未能取得实质性的疗效，同时预测这些代表性不足的情况。为了解决这个问题，合成少数民族过采样方法（SMOTE）的许多变化已经被提出来平衡数据集，其处理连续的特点。然而，对于具有名义和连续特征的数据集，SMOTE-NC是基于SMOTE仅过采样技术，以平衡数据。在本文中，我们提出了一个新颖的少数过采样方法，SMOTE-ENC（SMOTE编码标称和连续），其中标称特性被编码为数字值和两个这样的数字值之间的差反映了与该关联的变化量少数类。我们的实验表明，使用比使用SMOTE-NC当这些数据具有的标称功能相当数量，也当在类别特征和目标类间的一些关联模型的SMOTE-ENC方法提供更好的预测，即分类模型。此外，我们提出的方法来解决的SMOTE-NC算法的主要限制之一。 SMOTE-NC可以仅在具有由连续的和标称的特征的特征，并且如果该数据集的所有功能都是标称不能起到混合的数据集来施加。我们的新方法已经被推广到同时应用于混合数据集和标称仅数据集。

著录项

来源
《Applied System Innovation》 |2021年第1期|共12页
作者
Mukherjee Mimi; Khushi Matloob;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类一般工业技术;
关键词
SMOTEnominal featurecontinuous featureclass imbalanceprecisionrecallarea under receiver operating characteristic curve (ROC-AUC)area under precision-recall curve (PR-AUC);

机译：SMOTENOMINAL FOREUREContINOULOULOULECLASS IMBalancePrecisionRecallarea在Precision-Recall曲线下的接收器操作特征曲线（Roc-AUC）区域（PR-AUC）下;

相似文献

外文文献
中文文献
专利

1. Feature selection algorithm for mixed data with both nominal and continuous features [J] . Wenyin Tang, K.Z. Mao Pattern recognition letters . 2007,第5期

机译：具有名义特征和连续特征的混合数据的特征选择算法
2. PIONEER DATA-DRIVEN METHODS GENERATING SYNTHETIC DATA: THE HLA "AVATARS" ARE SHIFTING PARADIGMS IN DATA SHARING [J] . Geffard Estelle, Goronflot Thomas, Limou Sophie, HLA. . 2019,第5期

机译：Pioneer数据驱动方法生成合成数据：HLA“化身”在数据共享中转换范例
3. A Bayesian method for analyzing combinations of continuous, ordinal, and nominal categorical data with missing values [J] . Zhang Xiao, Boscardin W. John, Belin Thomas R., Journal of Multivariate Analysis: An International Journal . 2015,第Null期

机译：一种贝叶斯方法，用于分析具有缺失值的连续，有序和名义分类数据的组合
4. Quantile-based Bootstrap Methods to Generate Continuous Synthetic Data [C] . Daniela Ichim EDBT/ICDT workshops . 2010

机译：基于分位数的自举方法来生成连续的合成数据
5. Synthetic steganography: Methods for generating and detecting covert channels in generated media [D] . Ritchey, Philip Carson 2015

机译：合成隐写术：在生成的媒体中生成和检测隐蔽通道的方法
6. A nonparametric method to generate synthetic populations to adjust for complex sampling design features [O] . Qi Dong, Michael R. Elliott, Trivellore E. Raghunathan -1

机译：生成合成种群以针对复杂的采样设计特征进行调整的非参数方法
7. Volume 2, Issue 3, Special issue on Recent Advances in Engineering Systems (Published Papers) Articles Transmit / Received Beamforming for Frequency Diverse Array with Symmetrical frequency offsets Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 1-6 (2017); View Description Detailed Analysis of Amplitude and Slope Diffraction Coefficients for knife-edge structure in S-UTD-CH Model Eray Arik, Mehmet Baris Tabakcioglu Adv. Sci. Technol. Eng. Syst. J. 2(3), 7-11 (2017); View Description Applications of Case Based Organizational Memory Supported by the PAbMM Architecture Martín, María de los Ángeles, Diván, Mario José Adv. Sci. Technol. Eng. Syst. J. 2(3), 12-23 (2017); View Description Low Probability of Interception Beampattern Using Frequency Diverse Array Antenna Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 24-29 (2017); View Description Zero Trust Cloud Networks using Transport Access Control and High Availability Optical Bypass Switching Casimer DeCusatis, Piradon Liengtiraphan, Anthony Sager Adv. Sci. Technol. Eng. Syst. J. 2(3), 30-35 (2017); View Description A Derived Metrics as a Measurement to Support Efficient Requirements Analysis and Release Management Indranil Nath Adv. Sci. Technol. Eng. Syst. J. 2(3), 36-40 (2017); View Description Feedback device of temperature sensation for a myoelectric prosthetic hand Yuki Ueda, Chiharu Ishii Adv. Sci. Technol. Eng. Syst. J. 2(3), 41-40 (2017); View Description Deep venous thrombus characterization: ultrasonography, elastography and scattering operator Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier Adv. Sci. Technol. Eng. Syst. J. 2(3), 48-59 (2017); View Description Improving customs’ border control by creating a reference database of cargo inspection X-ray images Selina Kolokytha, Alexander Flisch, Thomas Lüthi, Mathieu Plamondon, Adrian Schwaninger, Wicher Vasser, Diana Hardmeier, Marius Costin, Caroline Vienne, Frank Sukowski, Ulf Hassler, Irène Dorion, Najib Gadi, Serge Maitrejean, Abraham Marciano, Andrea Canonica, Eric Rochat, Ger Koomen, Micha Slegt Adv. Sci. Technol. Eng. Syst. J. 2(3), 60-66 (2017); View Description Aviation Navigation with Use of Polarimetric Technologies Arsen Klochan, Ali Al-Ammouri, Viktor Romanenko, Vladimir Tronko Adv. Sci. Technol. Eng. Syst. J. 2(3), 67-72 (2017); View Description Optimization of Multi-standard Transmitter Architecture Using Single-Double Conversion Technique Used for Rescue Operations Riadh Essaadali, Said Aliouane, Chokri Jebali and Ammar Kouki Adv. Sci. Technol. Eng. Syst. J. 2(3), 73-81 (2017); View Description Singular Integral Equations in Electromagnetic Waves Reflection Modeling A. S. Ilinskiy, T. N. Galishnikova Adv. Sci. Technol. Eng. Syst. J. 2(3), 82-87 (2017); View Description Methodology for Management of Information Security in Industrial Control Systems: A Proof of Concept aligned with Enterprise Objectives. Fabian Bustamante, Walter Fuertes, Paul Diaz, Theofilos Toulqueridis Adv. Sci. Technol. Eng. Syst. J. 2(3), 88-99 (2017); View Description Dependence-Based Segmentation Approach for Detecting Morpheme Boundaries Ahmed Khorsi, Abeer Alsheddi Adv. Sci. Technol. Eng. Syst. J. 2(3), 100-110 (2017); View Description Paper Improving Rule Based Stemmers to Solve Some Special Cases of Arabic Language Soufiane Farrah, Hanane El Manssouri, Ziyati Elhoussaine, Mohamed Ouzzif Adv. Sci. Technol. Eng. Syst. J. 2(3), 111-115 (2017); View Description Medical imbalanced data classification Sara Belarouci, Mohammed Amine Chikh Adv. Sci. Technol. Eng. Syst. J. 2(3), 116-124 (2017); View Description ADOxx Modelling Method Conceptualization Environment Nesat Efendioglu, Robert Woitsch, Wilfrid Utz, Damiano Falcioni Adv. Sci. Technol. Eng. Syst. J. 2(3), 125-136 (2017); View Description GPSR+Predict: An Enhancement for GPSR to Make Smart Routing Decision by Anticipating Movement of Vehicles in VANETs Zineb Squalli Houssaini, Imane Zaimi, Mohammed Oumsis, Saïd El Alaoui Ouatik Adv. Sci. Technol. Eng. Syst. J. 2(3), 137-146 (2017); View Description Optimal Synthesis of Universal Space Vector Digital Algorithm for Matrix Converters Adrian Popovici, Mircea Băbăiţă, Petru Papazian Adv. Sci. Technol. Eng. Syst. J. 2(3), 147-152 (2017); View Description Control design for axial flux permanent magnet synchronous motor which operates above the nominal speed Xuan Minh Tran, Nhu Hien Nguyen, Quoc Tuan Duong Adv. Sci. Technol. Eng. Syst. J. 2(3), 153-159 (2017); View Description A synchronizing second order sliding mode control applied to decentralized time delayed multi−agent robotic systems: Stability Proof Marwa Fathallah, Fatma Abdelhedi, Nabil Derbel Adv. Sci. Technol. Eng. Syst. J. 2(3), 160-170 (2017); View Description Fault Diagnosis and Tolerant Control Using Observer Banks Applied to Continuous Stirred Tank Reactor Martin F. Pico, Eduardo J. Adam Adv. Sci. Technol. Eng. Syst. J. 2(3), 171-181 (2017); View Description Development and Validation of a Heat Pump System Model Using Artificial Neural Network Nabil Nassif, Jordan Gooden Adv. Sci. Technol. Eng. Syst. J. 2(3), 182-185 (2017); View Description Assessment of the usefulness and appeal of stigma-stop by psychology students: a serious game designed to reduce the stigma of mental illness Adolfo J. Cangas, Noelia Navarro, Juan J. Ojeda, Diego Cangas, Jose A. Piedra, José Gallego Adv. Sci. Technol. Eng. Syst. J. 2(3), 186-190 (2017); View Description Kinect-Based Moving Human Tracking System with Obstacle Avoidance Abdel Mehsen Ahmad, Zouhair Bazzal, Hiba Al Youssef Adv. Sci. Technol. Eng. Syst. J. 2(3), 191-197 (2017); View Description A security approach based on honeypots: Protecting Online Social network from malicious profiles Fatna Elmendili, Nisrine Maqran, Younes El Bouzekri El Idrissi, Habiba Chaoui Adv. Sci. Technol. Eng. Syst. J. 2(3), 198-204 (2017); View Description Pulse Generator for Ultrasonic Piezoelectric Transducer Arrays Based on a Programmable System-on-Chip (PSoC) Pedro Acevedo, Martín Fuentes, Joel Durán, Mónica Vázquez, Carlos Díaz Adv. Sci. Technol. Eng. Syst. J. 2(3), 205-209 (2017); View Description Enabling Toy Vehicles Interaction With Visible Light Communication (VLC) M. A. Ilyas, M. B. Othman, S. M. Shah, Mas Fawzi Adv. Sci. Technol. Eng. Syst. J. 2(3), 210-216 (2017); View Description Analysis of Fractional-Order 2xn RLC Networks by Transmission Matrices Mahmut Ün, Manolya Ün Adv. Sci. Technol. Eng. Syst. J. 2(3), 217-220 (2017); View Description Fire extinguishing system in large underground garages Ivan Antonov, Rositsa Velichkova, Svetlin Antonov, Kamen Grozdanov, Milka Uzunova, Ikram El Abbassi Adv. Sci. Technol. Eng. Syst. J. 2(3), 221-226 (2017); View Description Directional Antenna Modulation Technique using A Two-Element Frequency Diverse Array Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 227-232 (2017); View Description Classifying region of interests from mammograms with breast cancer into BIRADS using Artificial Neural Networks Estefanía D. Avalos-Rivera, Alberto de J. Pastrana-Palma Adv. Sci. Technol. Eng. Syst. J. 2(3), 233-240 (2017); View Description Magnetically Levitated and Guided Systems Florian Puci, Miroslav Husak Adv. Sci. Technol. Eng. Syst. J. 2(3), 241-244 (2017); View Description Energy-Efficient Mobile Sensing in Distributed Multi-Agent Sensor Networks Minh T. Nguyen Adv. Sci. Technol. Eng. Syst. J. 2(3), 245-253 (2017); View Description Validity and efficiency of conformal anomaly detection on big distributed data Ilia Nouretdinov Adv. Sci. Technol. Eng. Syst. J. 2(3), 254-267 (2017); View Description S-Parameters Optimization in both Segmented and Unsegmented Insulated TSV upto 40GHz Frequency Juma Mary Atieno, Xuliang Zhang, HE Song Bai Adv. Sci. Technol. Eng. Syst. J. 2(3), 268-276 (2017); View Description Synthesis of Important Design Criteria for Future Vehicle Electric System Lisa Braun, Eric Sax Adv. Sci. Technol. Eng. Syst. J. 2(3), 277-283 (2017); View Description Gestural Interaction for Virtual Reality Environments through Data Gloves G. Rodriguez, N. Jofre, Y. Alvarado, J. Fernández, R. Guerrero Adv. Sci. Technol. Eng. Syst. J. 2(3), 284-290 (2017); View Description Solving the Capacitated Network Design Problem in Two Steps [O] . Meriem Khelifi, Mohand Yazid Saidi, Saadi Boudjit 2017

机译：第2卷，第3卷，工程系统最近进步的特殊问题（已发布论文）文章传输/接收频率各种阵列的波束成形，具有对称频率偏移Shaddrack偏航Nusenu Adv。 SCI。技术。 eng。系统。 J. 2（3），1-6（2017）;查看描述S-UTD-CH模型Eray Arik刀刃结构幅度和坡度衍射系数的详细分析，Mehmet Baris Tabakcioglu Adv。 SCI。技术。 eng。系统。 J. 2（3），7-11（2017）;查看描述案例基于组织内存的案例组织内存由PABMM ArchitectralMartín，MaríadeLosÁngeles，Diván，MarioJoséAven。 SCI。技术。 eng。系统。 J. 2（3），12-23（2017）;查看说明使用频率各种阵列天线Shaddrack偏航Nusenu Adv的低拦截横梁仪表概率。 SCI。技术。 eng。系统。 J. 2（3），24-29（2017）;查看说明零信任云网络使用传输访问控制和高可用性光学旁路交换套管切换西米列德·莱格托希金，安东尼Sager adv。 SCI。技术。 eng。系统。 J. 2（3），30-35（2017）;视图描述派生指标作为支持有效的需求分析和发布管理Indranil Nath ADV的测量。 SCI。技术。 eng。系统。 J. 2（3），36-40（2017）;视图描述肌电假肢yuki ueda的温度感觉反馈装置，恰米·伊莎。 SCI。技术。 eng。系统。 J. 2（3），41-40（2017）;查看描述深静脉血栓表征：超声检查，弹性造影和散射操作员Thibaud Berthomier，Ali Mansour，Luc Bressollette，FrédéricLeRoy，Dominique Mottier Adv。 SCI。技术。 eng。系统。 J. 2（3），48-59（2017）;查看说明通过创建货物检测的参考数据库来改进海关边界控制X射线图像Selina Kolokytha，Alexander Flisch，ThomasLüthi，Mathieu Plamondon，Adrian Schwaninger，Wiana Schwaninger，Wiana Hardmeier，Marius Costin，Caroline Vienne，Frank Sukowski，ULF哈桑德勒，伊瑞恩多森，纳吉·甘迪，塞尔格·马西亚诺，亚伯拉·马西亚诺，安德雷阿索尼卡，埃里克·罗·克，Ger Komen，Micha Slegt Adv。 SCI。技术。 eng。系统。 J. 2（3），60-66（2017）;查看说明航空导航使用偏光技术Arsen Klochan，Ali Al-Ammouri，Viktor Romanenko，Vladimir Tronko Adv。 SCI。技术。 eng。系统。 J. 2（3），67-72（2017）;查看描述使用用于救援运营的单双转换技术优化多标准变送器架构Riadue Essaadali，Chokri Jebali和Ammar Kouki Adv。 SCI。技术。 eng。系统。 J. 2（3），73-81（2017）;视图描述电磁波反射模型中的奇异积分方程A. S.Ilinskiy，T.Galishnikova Adv。 SCI。技术。 eng。系统。 J. 2（3），82-87（2017）;查看工业控制系统信息安全管理的描述方法：概念证明与企业目标对齐。 Fabian Bustamante，Walter Fuertes，Paul Diaz，Theofilos Toulqueridis adv。 SCI。技术。 eng。系统。 J. 2（3），88-99（2017年）;查看描述依赖基于依赖的分割方法，用于检测语素边界Ahmed Khorsi，Abeer Alsheddi Adv。 SCI。技术。 eng。系统。 J. 2（3），100-110（2017）;查看描述纸张改进了基于统治的犹太人，解决了阿拉伯语Soufiane Farrah，Hanane El Manssouri，Ziyati Elhoussaine，Mohamed Ouzzif Adv。 SCI。技术。 eng。系统。 J. 2（3），111-115（2017）;查看描述医疗不平衡数据分类Sara Belarouci，穆罕默德胺Chikh Adv。 SCI。技术。 eng。系统。 J. 2（3），116-124（2017）;查看描述adoxx建模方法概念化环境Nesat Efendioglu，Robert Woitsch，Wilfrid Utz，Damiano Falcioni Adv。 SCI。技术。 eng。系统。 J. 2（3），125-136（2017）;查看描述GPSR +预测：通过预期Vanets Zineb Squalli Houssaini，Imane Zaimi，Mohammed Oumsis，SaïdelAlaouiOuatik Advik Advik Advik Advik Advik Acik Adve，GPSR +预测SCI。技术。 eng。系统。 J.2（3），137-146（2017）;查看说明矩阵转换器通用空间矢量数字算法的最佳合成Adrian Popovici，MirceaBăBăIţă，Petru Papazian adv。 SCI。技术。 eng。系统。 J. 2（3），147-152（2017）;视图描述轴向磁通永磁同步电动机的控制设计，其在标称旋转Xuan Minh Tran，Nhu Hien Nguyen，CACoc Tuan Duong Adv。 SCI。技术。 eng。系统。 J. 2（3），153-159（2017）;视图说明A同步应用于分散时间延迟多功能机器人系统：稳定性证明Marwa Fathallah，Fatma Abdelhedi，Nabil Derbel Adv。 SCI。技术。 eng。系统。 J. 2（3），160-170（2017年）;查看描述故障诊断和耐受控制使用观察者银行应用于连续搅拌坦克反应器Martin F. Pico，Eduardo J. Adam Adv。 SCI。技术。 eng。系统。 J. 2（3），171-181（2017年）;查看说明用人工神经网络利用人工神经网络的热泵系统模型的开发和验证Nabil Nassif，Jordan Goodend Adv。 SCI。技术。 eng。系统。 J. 2（3），182-185（2017）;查看描述对心理学学生的耻辱 - 终止的有用性和吸引力的描述：一场严肃的比赛，旨在减少精神疾病的耻辱，诺埃尔·纳瓦罗，Juan J. Ojeda，迭戈库戈，何塞A. Piedra，joséGallego adv。 SCI。技术。 eng。系统。 J. 2（3），186-190（2017）;视图说明基于Kinect的移动人类跟踪系统，避免避让人Abdel Mehsen Ahmad，Zouhair Bazzal，Hiba Al Youssef Adv。 SCI。技术。 eng。系统。 J. 2（3），191-197（2017年）;视图描述基于蜜罐的安全方法：保护在线社交网络免受恶意配置文件FATNA Elmendili，Nisrine Maqran，Younes el Bouzekri El Idrissi，Habiba Chaoui Adv。 SCI。技术。 eng。系统。 J. 2（3），198-204（2017）;视图描述超声波压电传感器阵列的基于可编程系统的片上（PSoC）Pedro Acevedo，MartínFentes，JoelDurán，MónicaVázquez，CarlosDíazadv。 SCI。技术。 eng。系统。 J. 2（3），205-209（2017）;查看描述使玩具车辆与可见光通信（VLC）的交互（VLC）M.A.Ilyas，M. B. Othman，S. S. Shah，Mas Fawzi Adv。 SCI。技术。 eng。系统。 J. 2（3），210-216（2017）;查看说明分析分数2xN RLC网络传输矩阵MahmutÜn，ManolyaÜndadv。 SCI。技术。 eng。系统。 J. 2（3），217-220（2017年）;查看描述灭火系统在大型地下车库Ivan Antonov，Rositsa Velichkova，Svetlin Antonov，Kamen Grozdanov，Milka Uzunova，Ikram El Abbassi Adv。 SCI。技术。 eng。系统。 J. 2（3），221-226（2017）;查看说明使用双元频率各种阵列的定向天线调制技术Shaddrack偏航Nusenu Adv。 SCI。技术。 eng。系统。 J. 2（3），227-232（2017）;查看描述使用人工神经网络与乳腺癌与乳腺癌的乳腺X乳头乳腺癌的兴趣区域进行分类，使用人工神经网络EstefaníaD.Avalos-Rivera，Alberto de J. Pastana-Palma Adv。 SCI。技术。 eng。系统。 J.2（3），233-240（2017）;查看描述磁悬浮和引导系统Florian Puci，Miroslav Husak Adv。 SCI。技术。 eng。系统。 J. 2（3），241-244（2017年）;视图说明分布式多功能传感器网络中的节能移动感应minh t. nguyen adv。 SCI。技术。 eng。系统。 J. 2（3），245-253（2017年）;视图描述大分布式数据Ilia Nouretdinov Adv的保形异常检测的有效性和效率。 SCI。技术。 eng。系统。 J. 2（3），254-267（2017年）;查看描述S参数优化在分段和未分段绝缘TSV中高达40GHz频率Juma Mary Atieno，Xuliang Zhang，He Song Bai Adv。 SCI。技术。 eng。系统。 J. 2（3），268-276（2017年）;查看描述综合未来车辆电气系统的重要设计标准Lisa Braun，Eric Sax Adv。 SCI。技术。 eng。系统。 J. 2（3），277-283（2017年）;查看描述虚拟现实环境的故障交互通过数据手套G. Rodriguez，N.Jofre，Y.Alvarado，J.Fernández，R.Guerrero Adv。 SCI。技术。 eng。系统。 J. 2（3），284-290（2017年）;查看描述在两个步骤中解决电容网络设计问题
8. Methods for Generating Synthetic Databases with Specified Statistical Properties [R] . Erner, K. A. 1996

机译：用指定统计特性生成合成数据库的方法

SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

摘要

著录项

相似文献

相关主题

期刊订阅