首页> 外文期刊>Computers in Biology and Medicine >Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data
【24h】

Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data

机译:数据挖掘,预测算法和通用交换和推理语言在分析社会经济健康数据中的研究

获取原文
获取原文并翻译 | 示例
           

摘要

While clinical and biomedical information in digital form has been escalating, it is socioeconomic factors that are important determinants of health on the national and global scale. We show how collective use of data mining and prediction algorithms to analyze socioeconomic population health data can stand beside classical correlation analysis in routine data analysis. The underlying theoretical basis is the Dirac notation and algebra that is a scientific standard but unusual outside of the physical sciences, combined with a theory of expected information first developed for analyzing sparse data but still largely confined to bioinformatics. The latter was important here because the records analyzed (which are for US counties and equivalents, not patients) are very few by contemporary data mining standards. The approach is very unlikely to be familiar to socioeconomic researchers, so the theory and the advantages of our inference nets over the Bayes Net are reviewed here, mostly using socioeconomic examples. While our expertise and focus is in regard to novel analytical methods rather than socioeconomics per se, a significant negative (countertrending) relationship between population health and equity was initially surprising, at least to the present authors. This encouraged deeper exploration including that of the relationship between our data mining methods and traditional Pearson's correlation. The latter is susceptible to giving wrong conclusions if a phenomenon called Simpson's paradox applies, so this is also investigated. Also discussed is that, even for very few records, associative data mining can still demand significant computational resources due to a combinatorial explosion.
机译:虽然数字形式的临床和生物医学信息一直在升级,但社会经济因素是国家和全球范围内健康的重要决定因素。我们展示了数据挖掘和预测算法的集体利用如何分析社会经济人口健康数据可以在常规数据分析中进行经典关联分析。基础的理论基础是狄拉克符号和代数,即物理科学的科学标准,但不寻常,与预期信息的理论相结合,首先开发用于分析稀疏数据,但仍然很大程度上仅限于生物信息学。后者很重要,因为分析的记录(适用于美国县和等同物,而不是患者)的当代数据挖掘标准很少。该方法非常不太可能对社会经济研究人员熟悉,因此这里的理论和我们推理网对贝叶斯网的优势在这里进行了审查,主要是使用社会经济示例。虽然我们的专业知识和重点在于新的分析方法而不是社会经济学本身,但人口健康和股权之间的显着负面(反宁)关系最初是令人惊讶的,至少对于现在的作者来说是令人惊讶的。这鼓励更深入的探索,包括我们的数据挖掘方法与传统皮尔逊的相关关系。如果一个名为SIMPSON的悖论适用的现象,后者易于出现错误的结论,因此也调查了这一点。另外讨论的是,即使对于很少的记录,甚至由于组合爆炸而仍然需要大量的计算资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号