首页> 外文会议>2015 2nd International Conference on Computing Technology and Information Management >A comparative study of web pages classification methods applied to health consumer web pages
【24h】

A comparative study of web pages classification methods applied to health consumer web pages

机译:应用于健康消费者网页的网页分类方法的比较研究

获取原文
获取原文并翻译 | 示例

摘要

These days, the Internet is developing at an exponential rate and can cover just about any data required. Nonetheless, the immense measure of web pages makes it more difficult to effectively discover the target data by a user. Therefore, an efficient method, for classifying this huge amount of data is essential if the web pages are to be exploited to its full potential. In the domain of automatic web page classifier many approaches have been tried to solve this problem using different Machine learning-based algorithms including Support Vector Machine (SVM), Naïve Bayes, Decision Tree, K-Nearest Neighbor (K-NN) and Neural Networks. However, there is a lack of comparison between these algorithms to find a better framework for the classification and analysis of health related web pages. In this research study, we compare two commonly used supervised Machine Learning algorithms; Support Vector Machines (SVM) and Naïve Bayes to classify web pages which provide drugs related information of patients for example side effects, patient action and follow-up information for patients. We use Unified Medical Language System (UMLS) to annotate the health related concepts in Web pages and train SVM and Naïve Bayes classifiers in General Architecture for Text Engineering to classify health related and non-health related Web pages. The evaluation was performed using K-fold cross validation using four runs on a data set of fifty Web pages. Results found that SVM performed better to classify health and non-health related pages in terms of precision, recall and F-measure.
机译:如今,Internet正在以指数级的速度发展,并且可以覆盖几乎所有需要的数据。尽管如此,网页的巨大度量使用户更难有效地发现目标数据。因此,如果要充分利用网页的潜力,一种用于对大量数据进行分类的有效方法至关重要。在自动网页分类器领域,尝试了多种方法来使用不同的基于机器学习的算法来解决此问题,包括支持向量机(SVM),朴素贝叶斯,决策树,K最近邻(K-NN)和神经网络。 。但是,这些算法之间缺乏比较,无法找到用于健康相关网页分类和分析的更好框架。在本研究中,我们比较了两种常用的监督式机器学习算法。支持向量机(SVM)和朴素贝叶斯(NaïveBayes)对网页进行分类,这些网页为患者提供与药物相关的信息,例如副作用,患者行为和患者的随访信息。我们使用统一医学语言系统(UMLS)来注释网页中与健康相关的概念,并在文本工程通用体系结构中训练SVM和朴素贝叶斯分类器来对与健康相关和与非健康相关的网页进行分类。使用K-fold交叉验证对50个网页的数据集进行四次运行,从而进行评估。结果发现,在准确性,召回率和F度量方面,SVM可以更好地对健康和非健康相关页面进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号