Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek

机译：观察不平衡的数据文本，以预测使用SMOTE，Tomek和SMOTE-Tomek在女性日报上销售产品的用户

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.

机译：女性日报是一个美容平台，其社交媒体应用程序通过在帖子中发布图片和文字来分享用户的美容经历。《女性日报》有条件，不得在其职位上使用该平台进行销售。不知何故，《女性日报》的用户有时会使用该平台销售美容产品。女性日报数据库中女性日报记录中的用户帖子。在这些数据中，存在关于禁止发布的用户帖子（少数族裔）和管理员不禁止发布的帖子的不平衡数据，因为该帖子不包含销售产品（少数族裔）。 SMOTE和Tomek是用于通过反复进行过采样和欠采样技术来处理不平衡数据以管理数据达到平衡的技术。在这项研究中，我们想使用SMOTE，Tomek和SMOTE-Tomek评估《女性日报》中的不平衡数据文本。我们将使用支持向量机（SVM）和Logistic回归（LR）来预测算法，该算法使用转换向量TF-IDF来评估预测在《女性日报》上销售产品的用户的最佳方法。这项研究的结果向我们显示了SMOTE，Tomek和SMOTE-Tomek对精确召回的影响在销售产品（多数类别）的人们中不是很高，并且还降低了精确召回率，但对销售产品（少数群体）的人们类）是积极的进步。每个指标的最高结果组合是： G-Mean SMOTE-Tomek与SVM的组合，SMOTE与SVM的少数族裔组合的精确度，SMOTE与LR的少数族裔组合的召回。这项研究的实验结果表明，使用SMOTE或SMOTE-Tomek方法是有用的。

著录项

来源
《IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology》|2020年|81-85|共5页
会议地点
作者
Bern Jonathan; Panca Hadi Putra; Yova Ruldeviyani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Imbalanced Data; Natural Language Processing; SMOTE-Tomek; G-Mean; Precision Recall;

机译：数据不平衡;自然语言处理; SMOTE-Tomek; G-Mean;精确召回;

相似文献

外文文献
中文文献
专利

1. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method [J] . Elhassan AT, Aljourf M, Al-Mohanna F, Global Journal of Technology and Optimization . 2016,第1期

机译：使用Tomek链接（T-Link）结合随机欠采样（RUS）作为数据约简方法对不平衡数据进行分类
2. Predicting Extreme Financial Risks on Imbalanced Dataset: A Combined Kernel FCM and Kernel SMOTE Based SVM Classifier [J] . Huang Xun, Zhang Cheng-Zhao, Yuan Jia Computational economics . 2020,第1期

机译：预测Imbalyded DataSet上的极端金融风险：基于SVM分类器的组合内核FCM和内核尺
3. HCAB‑SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification [J] . Hisham Al Majzoub, Islam Elgedawy, Öykü Akaydin, Arabian Journal for Science and Engineering. Section A, Sciences . 2020,第4期

机译：HCAB‑SMOTE：一种用于不平衡数据二进制分类的混合聚类相似边界SMOTE方法
4. Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data [C] . Min Zeng, Beiji Zou, Faran Wei, Proceedings of 2016 IEEE International Conference of Online Analysis and Computing Science . 2016

机译：通过结合SMOTE和Tomek链接技术有效预测三种常见疾病，以实现医疗数据不平衡
5. The influence of psychological predictors and cognitive behavioral stress management intervention on antiretroviral therapy (ART) adherence among HIV-positive female Haitian alcohol users [D] . Jean, Pascale Cecile 2015

机译：心理预测因素和认知行为应激管理干预对HIV阳性女性海地酒精使用者抗逆转录病毒疗法（ART）依从性的影响
6. DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction [O] . Yanyun Tao, Yuzhen Zhang, Bin Jiang 2020

机译：DBCSMOTE：一种基于聚类的用于数据不平衡的Warfarin剂量预测的过采样技术
7. ABSTRACT Various body parts or organs can be analysed to identify the different diseases in the human body. Fingernail analysis is one of the ways to identify disease in the human body. Nails are the body part which are farthest from the heart and therefore receive oxygen at last. As a result the nails are the first who show the symptoms of a disease in the human body. Fingernails can be easily captured for diagnosis and there are no heavy equipment or no specific conditions required to use nail image for disease diagnosis, like in other tests and scanning processes. Human nails deliver beneficial information about complaints or any nutritive imbalances in the human body depending upon their shape, texture and colour. In human beings, numerous systemic and skin diseases can be easily analyzed through careful examination of nails of both the limbs. A lot of nail illnesses have been found to be primary signs of numerous underlying systemic illnesses. The colour, texture or shape changes in nails are signs of many diseases mainly affecting nails. Considering all these properties of nails a system is proposed that uses digital image processing (DIP) methods for identifying such changes in the human nail to get more precise results and predict numerous diseases effortlessly. With the emerging Internet of Things (IOT) concept the generated report is made available remotely, this will help users to reduce transportation efforts. As the system has to deal with large and private data, the security of data must be ensured. To keep the data confidential, the Blockchain concept which is one of the most emerging concepts in the field of data management is used. The paper contains the implementation of the digital image processing for feature extraction of nail images, usage of IOT (ThingSpeak cloud) for data storage and implementation of Blockchain to keep the system secured and theft free. KEY WORDS: Int ernet of thin gs (IOT), Image proc essin g, Thin gSpeak, RG B vavalues, Mean pi xel vavalues, Bloc kchain , Hash key. Disease Diagnostic System: Abnormalities in Human Nail [O] . Pranav S. Wazarkar 2020

机译：摘要的各个身体部位或器官可被分析以识别在人体内的不同的疾病。指甲分析来识别人体疾病的方法之一。指甲是身体一部分是离心脏最远，因此在最后接受氧气。作为结果，指甲是第一谁表现出人体疾病的症状。指甲可以容易地捕获用于诊断和没有重装或需要使用指甲图像用于疾病诊断，比如在其他测试和扫描过程没有特定的条件。人的指甲提供有关投诉或取决于它们的形状，纹理和色彩在人体内的任何营养失衡有益的信息。在人类中，许多全身性皮肤疾病是可以很容易地通过两个四肢指甲的仔细检查分析。很多指甲病已发现众多潜在系统性疾病的主要症状。在指甲的颜色，质地和形状的变化是许多疾病主要影响指甲的迹象。考虑到所有的指甲的这些性能的系统被提出，用于识别人指甲这样的变化以获得更精确的结果，并毫不费力预测许多疾病用途的数字图像处理（DIP）方法。随着物联网（IOT）的概念，新兴的互联网将生成的报告提供远程，这将帮助用户降低运输工作。由于系统必须处理大量的私人数据，数据的安全性必须得到保证。为了保持数据的机密性，使用Blockchain的概念，它是在数据管理领域的大多数新兴的概念之一。本文包含了数字图像处理的指甲图像，IOT（ThingSpeak云）的使用为数据存储和执行Blockchain的特征提取的执行，以保持固定的系统和盗窃免费。关键词：诠释薄GS（IOT），图像的ERNET PROC essin克，薄型gSpeak，RG乙vavalues，平均数PI XEL vavalues，阵营kchain，哈希密钥。疾病诊断系统：在人类指甲异常

Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek

摘要

著录项

相似文献

相关主题

期刊订阅