首页> 外文会议>International Conference on Data and Software Engineering >Information extractor for small medium enterprise aggregator
【24h】

Information extractor for small medium enterprise aggregator

机译:中小企业聚合器的信息提取器

获取原文
获取外文期刊封面目录资料

摘要

Indonesia have a massive number of SMEs, but with a very low revenue. An alternative to increase revenue is by using internet. Some SMEs already develop their website, but they don't have same navigation. The websites confuse the potential buyers. So, a website's aggregator is essential. This aggregator is made without the owner of the SMEs to register their website, which means it can automatically show website's content that already been made. For this purpose, two stages is required. First is to find relevant SMEs websites, and the second is to extract information automatically. This paper focuses on information extractor to extract information from SMEs e-commerce website with or without shopping cart feature, used to make an automatic SME aggregator and make prototype database. Learning algorithms is needed to recognize information that will be extracted. The research is about how to preprocessing website pages and what is the best algorithm for automatic information extraction. The system will compare three algorithms, Naïve Bayes, Decision Tree, and Support Vector Machine. Algorithm with the best accuracy will be used for the system's model. Support Vector Machine is the best algorithm. SMOTE, which is method to solve imbalanced data set problem by oversampling minority class, is the best filter for system's training model. System can extract information with best performance from SMEs e-commerce website with shopping cart feature.
机译:印度尼西亚有大量的中小企业,但收入很低。另一种增加收入的方法是使用互联网。一些中小型企业已经在开发他们的网站,但是它们没有相同的导航。这些网站使潜在的买家感到困惑。因此,网站的聚合器至关重要。该汇总器的创建无需中小企业所有者就可以注册其网站,这意味着它可以自动显示已经创建的网站内容。为此,需要两个阶段。首先是找到相关的中小企业网站,其次是自动提取信息。本文重点研究信息提取器,从具有或不具有购物车功能的中小企业电子商务网站中提取信息,用于制作自动中小企业聚合器和建立原型数据库。需要学习算法来识别将要提取的信息。这项研究是关于如何预处理网站页面以及什么是自动信息提取的最佳算法。该系统将比较三种算法,朴素贝叶斯,决策树和支持向量机。系统模型将使用精度最高的算法。支持向量机是最好的算法。 SMOTE是通过训练少数族群来解决不平衡数据集问题的方法,是系统训练模型的最佳过滤器。系统可以从具有购物车功能的中小企业电子商务网站中提取性能最佳的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号