A comparative study of web pages classification methods applied to health consumer web pages

机译：应用于健康消费者网页的网页分类方法的比较研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

These days, the Internet is developing at an exponential rate and can cover just about any data required. Nonetheless, the immense measure of web pages makes it more difficult to effectively discover the target data by a user. Therefore, an efficient method, for classifying this huge amount of data is essential if the web pages are to be exploited to its full potential. In the domain of automatic web page classifier many approaches have been tried to solve this problem using different Machine learning-based algorithms including Support Vector Machine (SVM), Naïve Bayes, Decision Tree, K-Nearest Neighbor (K-NN) and Neural Networks. However, there is a lack of comparison between these algorithms to find a better framework for the classification and analysis of health related web pages. In this research study, we compare two commonly used supervised Machine Learning algorithms; Support Vector Machines (SVM) and Naïve Bayes to classify web pages which provide drugs related information of patients for example side effects, patient action and follow-up information for patients. We use Unified Medical Language System (UMLS) to annotate the health related concepts in Web pages and train SVM and Naïve Bayes classifiers in General Architecture for Text Engineering to classify health related and non-health related Web pages. The evaluation was performed using K-fold cross validation using four runs on a data set of fifty Web pages. Results found that SVM performed better to classify health and non-health related pages in terms of precision, recall and F-measure.

机译：如今，Internet正在以指数级的速度发展，并且可以覆盖几乎所有需要的数据。尽管如此，网页的巨大度量使用户更难有效地发现目标数据。因此，如果要充分利用网页的潜力，一种用于对大量数据进行分类的有效方法至关重要。在自动网页分类器领域，尝试了多种方法来使用不同的基于机器学习的算法来解决此问题，包括支持向量机（SVM），朴素贝叶斯，决策树，K最近邻（K-NN）和神经网络。。但是，这些算法之间缺乏比较，无法找到用于健康相关网页分类和分析的更好框架。在本研究中，我们比较了两种常用的监督式机器学习算法。支持向量机（SVM）和朴素贝叶斯（NaïveBayes）对网页进行分类，这些网页为患者提供与药物相关的信息，例如副作用，患者行为和患者的随访信息。我们使用统一医学语言系统（UMLS）来注释网页中与健康相关的概念，并在文本工程通用体系结构中训练SVM和朴素贝叶斯分类器来对与健康相关和与非健康相关的网页进行分类。使用K-fold交叉验证对50个网页的数据集进行四次运行，从而进行评估。结果发现，在准确性，召回率和F度量方面，SVM可以更好地对健康和非健康相关页面进行分类。

著录项

来源
《2015 2nd International Conference on Computing Technology and Information Management》|2015年|43-48|共6页
会议地点 Johor(MY)
作者
Siddiqui Aneeta; Adnan Mehnaz; Siddiqui Rizwan Alam; Mubeen Tauseef;
展开▼
作者单位

Sir Syed Univ. of Eng. Technol., Karachi, Pakistan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine Learning; Unified Medical Language System(UMLS); Web Mining; Web Page Classification;

机译：机器学习;统一医学语言系统（UMLS）; Web挖掘; Web页面分类;

相似文献

外文文献
中文文献
专利

1. Consumer health 2.0 in Canada: a descriptive analysis of the use of Web 2.0 technologies on Canadian consumer health information websites [J] . Christine Marton Journal of the Canadian Health Libraries Association: Association des Bibliotheques de la Sante du Canada. Journal . 2011,第1期

机译：加拿大的消费者健康2.0：对加拿大消费者健康信息网站上Web 2.0技术的使用的描述性分析
2. Themes affecting health-care consumers' choice of a hospital for elective surgery when receiving web-based comparative consumer information. [J] . Moser A, Korstjens I, van der Weijden T, Patient education and counseling . 2010,第3期

机译：当接收基于网络的比较消费者信息时，影响医疗保健消费者选择医院进行择期手术的主题。
3. Consumer Health Search on the Web: Study of Web Page Understandability and Its Integration in Ranking Algorithms [J] . Joao Palotti, Guido Zuccon, Allan Hanbury Journal of medical Internet research . 2019,第1期

机译：Web上的消费者健康搜索：网页可理解性及其在排名算法中的集成研究
4. A comparative study of web pages classification methods applied to health consumer web pages [C] . Siddiqui Aneeta, Adnan Mehnaz, Siddiqui Rizwan Alam, International Conference on Computing Technology and Information Management . 2015

机译：网页分类方法的比较研究应用于健康消费者网页
5. Effect of metasite selection on the quality of World Wide Web information: A collection development approach to the evaluation of Web-based consumer health information on the treatment of hypercholesterolemia. [D] . Hogan, Linda. 2001

机译：站点选择对万维网信息质量的影响：一种收集开发方法，用于评估基于高胆固醇血症的基于Web的消费者健康信息。
6. Evaluation of Web Accessibility of Consumer Health Information Websites [O] . Xiaoming Zeng, Bambang Parmanto 2003

机译：消费者健康信息网站的Web可访问性评估
7. The Influence of Website Service Quality towards Consumer Satisfaction by Using Webqual 4.0 Method: Study on Grab Users in Bandung [O] . Febri Hikmah Haryanti, Retno Setyorini 2019

机译：使用WebQual 4.0方法对网站服务质量对消费满意度的影响：荷兰通抓住用户的研究

A comparative study of web pages classification methods applied to health consumer web pages

摘要

著录项

相似文献

相关主题

期刊订阅