首页> 外文期刊>Methods of information in medicine >A complementary graphical method for reducing and analyzing large data sets: Case studies demonstrating thresholds setting and selection
【24h】

A complementary graphical method for reducing and analyzing large data sets: Case studies demonstrating thresholds setting and selection

机译:减少和分析大数据集的补充图形方法:案例研究,演示阈值设置和选择

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Objectives: Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases. Methods: Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected class count thresholds, are demonstrated with details in case studies. The main steps include: data preparation, data manipulation, computation, and threshold selection and visualization. We also describe the data models for different types of thresholds and the considerations for thresholds selection. Results: The filtered graphs are 1%-3% of the size of the original graphs. For our case studies, the graphs provide 1) the most heavily used ICD9-CM codes, 2) the codes with most patients in a research hospital in 2011, 3) a profile of publications on "heavily represented topics" in MEDLINE in 2011, and 4) validated knowledge about adverse effects of the medication of rosiglitazone and new interesting areas in the ICD9-CM hierarchy associated with patients taking the medication of pioglitazone. Conclusions: Our filtering method reduces large graphs to a manageable size by removing relatively unimportant nodes. The graphical method provides summary views based on computation of usage frequency and semantic context of hierarchical terminology. The method is applicable to large data sets (such as a hundred thousand records or more) and can be used to generate new hypotheses from data sets coded with hierarchical terminologies.
机译:目标:图形显示可以使数据更易于理解;但是,大图可能会挑战人类的理解力。前面我们已经描述了一种过滤方法来提供大型数据集的高级摘要视图。在本文中,我们通过将其应用于来自患者和书目数据库的大型单一和成对数据集,展示了用于设置和选择阈值以限制图形大小并保留重要信息的方法。方法:四个案例研究用于说明我们的方法。数据可以是出院诊断(使用国际疾病分类,临床修饰[ICD9-CM]编码)或Medline引文(使用医学主题词[MeSH]编码)。我们使用不同阈值的组合来获取经过过滤的图形,以进行详细分析。在案例研究中详细说明了阈值设置和选择,例如节点计数,类计数,比率值,p值(用于差异数据集)的阈值和所选类计数阈值的百分位数。主要步骤包括:数据准备,数据处理,计算以及阈值选择和可视化。我们还描述了不同类型阈值的数据模型以及阈值选择的注意事项。结果:过滤后的图仅占原始图大小的1%-3%。对于我们的案例研究,这些图表提供了1)2011年使用最频繁的ICD9-CM代码,2)研究医院中大多数患者使用的代码,3)2011年MEDLINE中有关“大量代表主题”的出版物的概况,和4)验证了有关罗格列酮药物不良反应的知识以及与服用吡格列酮药物的患者相关的ICD9-CM等级中新的有趣领域。结论:我们的过滤方法通过删除相对不重要的节点将大型图减小到可管理的大小。该图形方法基于使用频率和层次术语的语义上下文的计算来提供摘要视图。该方法适用于大型数据集(例如十万条记录或更多),并可用于从使用分层术语编码的数据集生成新的假设。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号