Mathematical Modelling and Artificial Intelligence in Luxembourg: Twenty PhD Students to be Trained in Data-Driven Modelling

Stéphane P.A. Bordas; Sundararajan Natarajan

摘要

Why Computational and Data Sciences: opportunities and challenges By enabling virtual experiments and computer simulations, scientific computing has become the third pillar of scientific investigation and is central to innovation in most domains of our lives (Rüde et al., 2018). It underpins the majority of today’s technological, economic and societal feats. An upcoming challenge is to harvest the fruits borne by computational sciences in research fields which have not yet benefited from its full potential, e.g. biology, health, the social and behavioural sciences as well as art. Our strategy to achieve this is to leverage the mathematical, procedural and algorithmic commonality between apparently disparate research fields. Achieving this aim will require ever-increasing amounts of data to be harnessed (Ley and Bordas, 2017). Data-Driven Discovery Understanding the world, which generates data at an increasing rate, relies on the ability to construct models. To be predictive, or even descriptive, these models must be able to adapt to new information (science model selection, aggregation and adaptation).We are therefore experiencing a change in paradigm from traditional, hypothesis-driven mathematical models to adaptive data-driven models, which are inherent to the concept of digital twins. Data-driven discovery will become the fourth pillar of science, and the integration of hypothesis-driven and data-driven science concepts is the future of scientific discovery and knowledge generation, by improving decision-making at all levels.Discovery through data requires integrated data mining, data exploration (interrogation and association), predictive modelling, sensitivity and uncertainty quantification and incorporation of feedback from new or higher quality data. Methods Machine learning and statistical analysis are fundamental to address the above issues and enable mapping input to output (supervised) and discovering the structure of input data (unsupervised learning).Concepts like support vector machines and random forests can be used for pattern recognition. Similar objects are automatically grouped into sets with clustering using k-means, mean-shift or spectral clustering and thereby help with patient cohort segmentation or grouping experimental results. Continuous-valued attributes associated with an object or person can be predicted with regression algorithms like lasso, ridge regression or Gaussian processes. Model selection methods improve predictive models by enabling the comparison, validation and selection of models and parameters using grid search or cross validation. Methods of dimensionality reduction help to unveil characteristic information hidden in large and/or high-dimensional data sets via principal component analysis, manifold learning or feature selection. Deep learning describes hierarchical learning from data representations by capturing various abstraction levels. Scientific Outcomes and Orientation of the Project DRIVEN’s scientific outcomes are expected to lead to strong novel results in each application domain. Yet, the most exciting scientific achievement brought forward by DRIVEN will be the design of novel multi-disciplinary methodologies. Moreover, the trained research students will become the first interdisciplinary translators able to fuel the third industrial revolution. Challenges Ahead and Future Activities The variety and power of machine learning and artificial intelligence techniques are steadily growing. While their ability to describe reality and discover unforeseen patterns in data is clear, our ability to critically evaluate uncertainty and the limitations of these approaches lags behind. More often than not, they are used as black boxes, delivering answers that cannot be easily understood. DRIVEN will investigate these issues by focusing on a few well-chosen fundamental problems in data classification, regression, model reduction and selection. Ethics DRIVEN’s research directions not only take into account shared technical or methodological similarities of research activities but also the implied ethical and philosophical issues related to the large-scale utilisation of data science techniques. This adds the important ethics co-dimension to the treatment of the addressed research questions and contributes to provoke the intellectual discussion between researchers across the fields by questioning and reflecting on the role of artificial intelligence in law, human labour and social equity. Partners and ERCIM Collaborators DRIVEN is led by Computational Engineering and Sciences (Zilian) at the University of Luxembourg in collaboration with Interdisciplinary Centres and the Luxembourgish research centres (LIST and LISER). The team will be provided with supplementary training provided through leading partners including U. Ghent, Inria (France) and ICES (University of Texas at Austin), through a Horizon 2020 TWINNING project (DRIVEN TWINNING).DRIVEN reinforces complementar

机译：为什么计算和数据科学：机遇与挑战通过启用虚拟实验和计算机模拟，科学计算已成为科学研究的第三大支柱，并且是我们生活中大部分领域创新的核心（Rüde等，2018）。它是当今大多数技术，经济和社会壮举的基础。即将到来的挑战是收获尚未充分利用其潜力的研究领域中计算科学所取得的成果。生物学，健康，社会科学和行为科学以及艺术。我们实现这一目标的策略是利用表面上截然不同的研究领域之间的数学，程序和算法共性。为了实现这一目标，将需要利用越来越多的数据（Ley and Bordas，2017）。数据驱动的发现理解世界的速度越来越快，它依赖于构建模型的能力。为了能够进行预测甚至描述，这些模型必须能够适应新信息（科学模型的选择，聚合和适应），因此我们正在经历从传统的假设驱动的数学模型到自适应数据驱动的模型的范式转变，这是数字双胞胎的概念所固有的。数据驱动的发现将成为科学的第四大支柱，假设驱动和数据驱动的科学概念的整合是科学发现和知识生成的未来，它将通过改善各级决策来实现。通过数据进行发现需要集成数据挖掘，数据探索（询问和关联），预测建模，敏感性和不确定性量化以及来自新数据或更高质量数据的反馈的合并。方法机器学习和统计分析是解决上述问题，使输入映射到输出（监督）并发现输入数据的结构（无监督学习）的基础。支持向量机和随机森林等概念可用于模式识别。通过使用k均值，均值平移或频谱聚类进行聚类，将相似的对象自动分组为集合，从而帮助进行患者队列分割或对实验结果进行分组。可以使用诸如套索，岭回归或高斯过程之类的回归算法来预测与对象或人相关联的连续值属性。模型选择方法通过使用网格搜索或交叉验证进行模型和参数的比较，验证和选择，从而改善了预测模型。降维方法有助于通过主成分分析，多方面学习或特征选择来揭示隐藏在大型和/或高维数据集中的特征信息。深度学习通过捕获各种抽象级别来描述从数据表示形式进行的分层学习。科学成果和项目定位DRIVEN的科学成果有望在每个应用领域带来新颖的成果。然而，DRIVEN提出的最令人兴奋的科学成就将是新颖的多学科方法论的设计。此外，训练有素的研究学生将成为能够推动第三次工业革命的第一批跨学科翻译。未来的挑战和未来的活动机器学习和人工智能技术的多样性和力量正在稳步增长。尽管他们描述现实和发现数据中无法预料的模式的能力很明显，但我们严格评估不确定性的能力以及这些方法的局限性都落后了。通常，它们被用作黑匣子，提供难以理解的答案。 DRIVEN将通过关注数据分类，回归，模型简化和选择中一些精心选择的基本问题来调查这些问题。《道德驱动》的研究方向不仅考虑了研究活动在技术上或方法上的共同点，而且还考虑了与大规模利用数据科学技术有关的隐含的伦理和哲学问题。这为解决已解决的研究问题增加了重要的伦理学共同维度，并通过质疑和反思人工智能在法律，人类劳动和社会公平中的作用，激发了各个领域研究人员之间的智力讨论。 DRIVEN的合作伙伴和ERCIM合作者由卢森堡大学计算工程与科学（Zilian）领导，与跨学科中心和卢森堡研究中心（LIST和LISER）合作。通过Horizon 2020 TWINNING项目（DRIVEN TWINNING），包括U.Ghent，Inria（法国）和ICES（德克萨斯大学奥斯汀分校）在内的主要合作伙伴将为团队提供补充培训。

Mathematical Modelling and Artificial Intelligence in Luxembourg: Twenty PhD Students to be Trained in Data-Driven Modelling

摘要

著录项

相似文献

相关主题

期刊订阅