首页> 外文会议>International Conference on Information and Communication Technology >Visiting Time Prediction Using Machine Learning Regression Algorithm
【24h】

Visiting Time Prediction Using Machine Learning Regression Algorithm

机译:基于机器学习回归算法的访问时间预测

获取原文
获取外文期刊封面目录资料

摘要

Smart tourists cannot be separated with mobile technology. With the gadget, tourist can find information about the destination, or supporting information like transportation, hotel, weather and exchange rate. They need prediction of traveling and visiting time, to arrange their journey. If traveling time has predicted accurately by Google Map using the location feature, visiting time has another issue. Until today, Google detects the user's position based on crowdsourcing data from customer visits to a specific location over the last several weeks. It cannot be denied that this method will give a valid information for the tourists. However, because it needs a lot of data, there are many destinations that have no information about visiting time. From the case study that we used, there are 626 destinations in East Java, Indonesia, and from that amount only 224 destinations or 35.78% has the visiting time. To complete the information and help tourists, this research developed the prediction model for visiting time. For the first data is tested statistically to make sure the model development was using the right method. Multiple linear regression become the common model, because there are six factors that influenced the visiting time, i.e. access, government, rating, number of reviews, number of pictures, and other information. Those factors become the independent variables to predict dependent variable or visiting time. From normality test as the linear regression requirement, the significant value was less than p that means the data cannot pass the statistic test, even though we transformed the data based on the skewness. Because of three of them are ordinal data and the others are interval data, we tried to exclude and include the ordinal by transform it to interval. We also used the Ordinal Logistic Regression by transform the interval data in dependent variable into ordinal data using Expectation Maximization, one of clustering algorithm in machine learning, but the model still did not fit even though we used 5 functions. Then we used the classification algorithm in machine learning by using 5 top algorithm which are Linear Regression, k-Nearest Neighbors, Decision Tree, Support Vector Machines, and Multi-Layer Perceptron. Based on maximum correlation coefficient and minimum root mean square error, Linear Regression with 6 independent variables has the best result with the correlation coefficient 20.41% and root mean square error 48.46%. We also compared with model using 3 independent variable, the best algorithm was still the same but with less performance. Then, the model was loaded to predict the visiting time for other 402 destinations.
机译:聪明的游客无法与移动技术区分开。借助该小工具,游客可以找到有关目的地的信息,或诸如交通,酒店,天气和汇率之类的支持信息。他们需要预测旅行时间和出行时间,以安排行程。如果Google Map使用定位功能准确预测了出行时间,则出行时间还有另一个问题。直到今天,Google都基于过去几周客户访问特定位置的众包数据来检测用户的位置。不可否认,这种方法将为游客提供有效的信息。但是,由于需要大量数据,因此许多目的地没有有关访问时间的信息。根据我们使用的案例研究,印度尼西亚东爪哇有626个目的地,而从这个数量来看,只有224个目的地或35.78%的人有访问时间。为了完善信息并为游客提供帮助,本研究开发了访问时间的预测模型。首先,对数据进行统计学检验,以确保模型开发使用的是正确的方法。多元线性回归成为常见模型,因为有六个因素会影响访问时间,即访问权限,政府,等级,评论数量,图片数量和其他信息。这些因素成为预测变量或访问时间的自变量。从正态检验作为线性回归要求,显着值小于p,这意味着即使我们根据偏度对数据进行了转换,数据也无法通过统计检验。由于其中三个是序数数据,其他三个是区间数据,因此我们尝试通过将序数转换为区间来排除和包括该序数。我们还使用序数逻辑回归,通过使用期望最大化(因期望是机器学习中的聚类算法之一)将因变量中的区间数据转换为序数数据,但是即使使用了5个函数,该模型仍然不适合。然后,我们将分类算法用于机器学习,使用了5种顶级算法,分别是线性回归,k最近邻,决策树,支持向量机和多层感知器。根据最大相关系数和最小均方根误差,具有6个独立变量的线性回归具有最佳结果,相关系数为20.41%,均方根误差为48.46%。我们还与使用3个自变量的模型进行了比较,最佳算法仍然相同,但性能较低。然后,加载模型以预测其他402个目的地的访问时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号