首页> 外文会议>International Conference on Advances in Databases, Knowledge, and Data Applications >A Multidimensional Data Modeling of the SEER Database from the USA National Cancer Institute
【24h】

A Multidimensional Data Modeling of the SEER Database from the USA National Cancer Institute

机译:来自美国国家癌症研究所的SEER数据库的多维数据建模

获取原文

摘要

Nowadays, one of the main challenges in computer science is to process the large amount of data available in diverse data sources, such as databases or files, in order to find useful information. For this purpose, it is required specialized tools that process raw data in a smart way to discover knowledge. In this paper, we present the design of a data warehouse and a tool called TDR (Tool Drill-Roll) that allow to discover knowledge from the database SEER (Surveillance, Epidemiology, and End Results) from the Cancer Institute in the United States of America, which has more than five million of records. The data warehouse is designed using a multidimensional approach and the TDR tool allows to exploit interesting information from SEER using drill-down and roll-up(two operators of On line Analytical Processing (OLAP)). The data warehouse can be seen at many levels of granularity. Our developed TDR tool allows knowing the statistics of the incidence, mortality and survival of patients with cancer along of years and extract useful information related to this disease that could be used to establish a relation between certain characteristics of patients that has an specific type of cancer. The knowledge discovered by our TDR tool could be of interest for government, health care institutes or research community for decision making. The main contribution of this paper is the discovery of new knowledge from the SEER database. The methodology used to design the data warehouse and the TDR tool could be applied to others domains with minimal changes.
机译:如今,计算机科学的主要挑战之一是处理各种数据源中可用的大量数据,例如数据库或文件,以寻找有用的信息。为此目的,需要以智能方式处理原始数据来发现知识的专业工具。在本文中,我们介绍了一个名为TDR(工具钻杆)的数据仓库和工具的设计,允许从美国癌症研究所从数据库中(监测,流行病学和最终结果)发现知识美国有超过五百万的记录。数据仓库使用多维方法设计,TDR工具允许使用深向上和汇总(两个在线分析处理(OLAP)的两个运算符)从SEER开发有趣信息。可以在许多级别的粒度下看到数据仓库。我们开发的TDR工具允许了解癌症患者的发病率,死亡率和存活的统计数据,并提取与该疾病相关的有用信息,这些疾病可用于建立具有特定类型癌症的某些特征之间的关系。我们的TDR工具发现的知识可能对政府,医疗机构或研究界进行决策感兴趣。本文的主要贡献是从SEER数据库发现新知识。用于设计数据仓库的方法和TDR工具可以应用于具有最小变化的其他域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号