首页> 外文会议>IEEE International Conference on Future Internet of Things and Cloud Workshops >Social Networks Benchmark Dataset for Diseases Classification
【24h】

Social Networks Benchmark Dataset for Diseases Classification

机译:疾病分类的社交网络基准数据集

获取原文

摘要

Social Network Analysis becomes an important field of research focusing on studying users' data and its contributions on social network media. The goal of this study is to build relations between people in the disease field and to analyze certain knowledge or activities. In order to accomplish these goals, investigators become very interest in social network analysis to conclude certain behavior or prediction from various data in social networks. People in the field of sociology expect that the relationship between people and the real-life style can be mirrored in the social networks. On the other hand, manual classification of unstructured data from social networks is almost impossible. Therefore, there is a required for an automatic classification method in order to formulate this data and to be more convenient and accessible. In this paper we are studying data of diseases from Facebook pages. These diseases are associated to the categories of popular diseases such as Ebola, Malaria and HIV/AIDS. In this paper we addressed classifier as a supervised learning task and an innovative dataset named Benchmark Dataset for Diseases Classification (BDDC) is created. BDDC is well-documented dataset and its file formats and compatible with recognized text mining tools and to be utilized in the comparative experiments by other researchers. Three commonly classifiers are used and two versions are BDDC are used. The performance results show that BDDC with stemmer performs better than the one without stemmer because of using stop words filtering and porter.
机译:社交网络分析已成为研究用户数据及其在社交网络媒体上的贡献的重要研究领域。这项研究的目的是在疾病领域建立人与人之间的关系,并分析某些知识或活动。为了实现这些目标,研究人员对社交网络分析非常感兴趣,以根据社交网络中的各种数据来推断某些行为或预测。社会学领域的人们期望人们与现实生活之间的关系可以反映在社交网络中。另一方面,来自社交网络的非结构化数据的手动分类几乎是不可能的。因此,需要一种自动分类方法,以便制定该数据并且更加方便和可访问。在本文中,我们正在研究来自Facebook页面的疾病数据。这些疾病与埃博拉,疟疾和艾滋病毒/艾滋病等流行疾病有关。在本文中,我们将分类器作为监督学习任务来解决,并创建了一个名为“疾病分类基准数据集”(BDDC)的创新数据集。 BDDC是文档齐全的数据集,其文件格式可与公认的文本挖掘工具兼容,并将被其他研究人员用于比较实验中。使用了三个常见的分类器,并且使用了两个版本的BDDC。性能结果表明,带词干的BDDC比不带词干的BDDC性能更好,这是因为使用了停用词过滤和搬运程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号