首页> 外文会议>IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology >Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek
【24h】

Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek

机译:观察不平衡的数据文本,以预测使用SMOTE,Tomek和SMOTE-Tomek在女性日报上销售产品的用户

获取原文

摘要

Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.
机译:女性日报是一个美容平台,其社交媒体应用程序通过在帖子中发布图片和文字来分享用户的美容经历。 《女性日报》有条件,不得在其职位上使用该平台进行销售。不知何故,《女性日报》的用户有时会使用该平台销售美容产品。女性日报数据库中女性日报记录中的用户帖子。在这些数据中,存在关于禁止发布的用户帖子(少数族裔)和管理员不禁止发布的帖子的不平衡数据,因为该帖子不包含销售产品(少数族裔)。 SMOTE和Tomek是用于通过反复进行过采样和欠采样技术来处理不平衡数据以管理数据达到平衡的技术。在这项研究中,我们想使用SMOTE,Tomek和SMOTE-Tomek评估《女性日报》中的不平衡数据文本。我们将使用支持向量机(SVM)和Logistic回归(LR)来预测算法,该算法使用转换向量TF-IDF来评估预测在《女性日报》上销售产品的用户的最佳方法。这项研究的结果向我们显示了SMOTE,Tomek和SMOTE-Tomek对精确召回的影响在销售产品(多数类别)的人们中不是很高,并且还降低了精确召回率,但对销售产品(少数群体)的人们类)是积极的进步。每个指标的最高结果组合是: G-Mean SMOTE-Tomek与SVM的组合,SMOTE与SVM的少数族裔组合的精确度,SMOTE与LR的少数族裔组合的召回。这项研究的实验结果表明,使用SMOTE或SMOTE-Tomek方法是有用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号