Scalability of FESOM2

FESOM2 scalability on DKRZ and JSC — Scaling results for the STORM mesh on Mistral (DKRZ Hamburg, a) and JUWELS (JSC Jülich, b) compute clusters. The black line indicates linear scaling and the orange line (and the number labels) gives the mean total computing time over the parallel partitions.

One important factor hampering the throughput of ocean models is their limited parallel scalability, that is, existing models struggle to make full use of the new generation of massively parallel high-performance computing (HPC) systems. Scalability bottlenecks often arise from the saturation of the parallel communication after mesh partitions become smaller than some number of surface mesh vertices per compute core depending on the model and on the hardware employed.

The main components of ocean circulation models limiting their scalability have been identified in the literature as the solver for the external (barotropic) mode and the sea-ice model. They represent two-dimensional (2-D) stiff parts of the solution algorithm and require either linear solvers (usually iterative) or explicit pseudo-time-stepping with very small time steps. Both approaches are not particularly computationally expensive; however, they introduce numerous exchanges of 2-D halos per time step of the ocean model.

In the work of Koldunov et al., 2019 we explore the scalability of FESOM2 and suggest several optimizations that help to improve it. We show that the external mode (SSH solver) and the sea-ice model are currently limiting the parallel scalability (with further practical complications arising from the implementation of the I/O). The lesson learned from the analysis presented in this study is the extent of the problem. Given that computational resources available for most current long-term simulations are in many cases still limited to 5000–10 000 or less cores, the parallel scalability is only beginning to emerge as a major issue on large (1/10∘ or finer) meshes. The current CPU architectures appear to be well suited for nearly all 3-D computational parts of FESOM2, thus the potential for improvement seems to lie in the direction of improved memory bandwidth, lower communication latency, and more efficient file systems – this can be then used as an indication when choosing “optimal” hardware. We show that, in terms of throughput, FESOM2 is on a par with state-of-the-art structured ocean models and, in a realistic eddy-resolving configuration (1/10° resolution), can achieve about 16 years per day on 14 000 cores. Suboptimal scaling of the sea ice combined with a sequential arrangement of sea-ice and ocean steps results in an inefficient utilization of computational resources and indicates clear directions for improvement. This, together with a better, scalable, parallel I/O, is the direction for future model code development to enable high-resolution climate simulations with reasonable throughputs.

References

Koldunov, N. V., Aizinger, V., Rakowsky, N., Scholz, P., Sidorenko, D., Danilov, S., and Jung, T.: Scalability and some optimization of the Finite-volumE Sea ice–Ocean Model, Version 2.0 (FESOM2), Geosci. Model Dev., 12, 3991–4012, doi.org/10.5194/gmd-12-3991-2019, 2019.