...
首页> 外文期刊>Methods of information in medicine >A complementary graphical method for reducing and analyzing large data sets: Case studies demonstrating thresholds setting and selection
【24h】

A complementary graphical method for reducing and analyzing large data sets: Case studies demonstrating thresholds setting and selection

机译:用于减少和分析大数据集的互补图示:案例研究证明阈值设置和选择

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Objectives: Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases. Methods: Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected class count thresholds, are demonstrated with details in case studies. The main steps include: data preparation, data manipulation, computation, and threshold selection and visualization. We also describe the data models for different types of thresholds and the considerations for thresholds selection. Results: The filtered graphs are 1%-3% of the size of the original graphs. For our case studies, the graphs provide 1) the most heavily used ICD9-CM codes, 2) the codes with most patients in a research hospital in 2011, 3) a profile of publications on "heavily represented topics" in MEDLINE in 2011, and 4) validated knowledge about adverse effects of the medication of rosiglitazone and new interesting areas in the ICD9-CM hierarchy associated with patients taking the medication of pioglitazone. Conclusions: Our filtering method reduces large graphs to a manageable size by removing relatively unimportant nodes. The graphical method provides summary views based on computation of usage frequency and semantic context of hierarchical terminology. The method is applicable to large data sets (such as a hundred thousand records or more) and can be used to generate new hypotheses from data sets coded with hierarchical terminologies.
机译:目标:图形显示可以使数据更加理解;但是,大图可以挑战人类的理解。我们之前描述了一种过滤方法,以提供大数据集的高级摘要视图。在本文中,我们演示了我们的方法来设置和选择阈值以限制图形大小,同时通过将重要信息应用于从患者和书目数据库中获取的大型单一和配对数据集来保留重要信息。方法:使用四种案例研究来说明我们的方法。数据是患者放电诊断(使用疾病的国际分类编码,临床修改[ICD9-CM])或MEDLINE CITATIONS(使用医疗主题标题进行编码[MESH])。我们使用不同阈值的组合来获得过滤的图表以进行详细分析。阈值设置和选择,例如节点计数,类计数,比率值,P值(对于Diff数据集)和所选类计数阈值的百分比,并在案例研究中进行说明。主要步骤包括:数据准备,数据操纵,计算和阈值选择和可视化。我们还描述了不同类型阈值的数据模型以及阈值选择的注意事项。结果:滤波的图形为原始图表大小的1%-3%。对于我们的案例研究,图表提供了1)个最多使用的ICD9-CM代码,2)2011年的研究医院中大多数患者的代码,3)2011年Medline中“大量代表主题”的出版物简介, 4)验证了关于罗格列唑酮和患有吡格列酮药物的患者的ICD9-CM等级中罗格列唑酮和新有趣区域的不良反应的知识。结论:通过删除相对不重要的节点,我们的滤波方法将大图降至可管理大小。图形方法基于计算频率和分层术语的语义上下文提供摘要视图。该方法适用于大数据集(例如十万记录或更多),并且可用于从与分层术语编码的数据集生成新假设。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号