NG41A-02 ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation
Subramaniam, A., . . . , B. Lütjens, et al.
(2024)
American Geophysical Union Fall Meeting, NG41A-02
Abstract / Summary:
Abstract
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid ML-physics simulations require domain-specific data and workflows that have been inaccessible to many ML experts.
As an extension of the ClimSim dataset (Yu et. al. NeurIPS 2024), we present ClimSim-Online, which also includes an end-to-end workflow for developing hybrid ML-physics simulators. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. The dataset is global and spans ten years at a high sampling frequency which allows for training sub-grid parameterization models easily.
Coupling a trained model with the host simulation code and achieving stable rollouts, however, poses a few challenges: 1) inferencing an arbitrarily complex ML model from a Fortran code in a performant manner and 2) evaluating online pathologies due to distribution shifts and mitigating instabilities. To address these challenges, we provide a cross-platform, containerized pipeline to integrate ML models into an operational climate simulator (E3SM) for hybrid testing. We also implement various ML baselines, alongside a hybrid baseline simulator, to highlight the ML challenges of building stable, skillful emulators. The data and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid ML-physics and high-fidelity climate simulations.
Plain-language Summary
Traditional climate simulation methods are unable to adequately capture the dynamics of small scale phenomena, like thunderstorms, due to resolution demands and high compute requirements. One promising approach to this challenge is to use machine learning (ML) to model the impact of these small scale phenomena on a lower resolution host simulator. However, there are challenges with integrating an ML model into a simulation code due to the different programming paradigms typically used in these domains. Here, we propose a solution to this challenge by providing a containerized workflow for integrating ML models into an operational climate simulator in a performant manner. This allows for evaluating the online performance of the hybrid embedded ML simulation, assessing the online pathologies and mitigating instabilities that grow from the coupled approach. To support development of hybrid ML-physics climate simulations, code and data for this workflow are made available publicly on GitHub.
Citation:
Subramaniam, A., . . . , B. Lütjens, et al. (2024): NG41A-02 ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation. American Geophysical Union Fall Meeting, NG41A-02 (https://agu.confex.com/agu/agu24/meetingapp.cgi/Paper/1713552)