class: center, middle, inverse, title-slide # Developing a Terrestrial Dynamical Core for E3SM ### Jed Brown,
jed.brown@colorado.edu
### Multicore-9, 2019-09-26 --- .pull-left-33[ ![E3SM Logo](https://esgf-node.llnl.gov/site_media/logos/E3SM_Logo_150px.png) * Water cycle modeling is critical for E3SM * Current subsurface flow is 1D ] .pull-right-66[![TDycore schematic](figures/tdycore/tdycore-schematic.jpg)] * TDycore SciDAC project: extend to global 3D * Composable software --- # TDycore requirements * Solve 3D variably saturated Richards Equation * Implicit thermal-hydrology (includes permafrost) * Portable, extensible, fun to experiment with * Accurate flow on distorted grids * High-contrast/anistropic coefficients - anisotropic tensor coefficients * Very high aspect ratio elements * Time accurate, Quasi-steady/spinup * Data assimilation, param estimation * Verification & Validation --- # Simplified model: Darcy flow ## Strong form: find `\(\mathbf u,p\)` such that `\begin{align} \mathbf u &= -\kappa \nabla p \\ \nabla \cdot \mathbf u &= f \\ \end{align}` subject to pressure `\(p = g_D\)` and/or flux `\(\mathbf u \cdot \mathbf n = g_N\)` boundary conditions. * Mixed finite elements - `\(BDM_1\)`, Wheeler-Yotov, Arbogast-Correa * Multi-point flux finite volume - MPFA-O: Cell-centered with reconstruction --- # Mixed FEM: `\(BDM_1\)` ![Wheeler Yotov (2016)](figures/tdycore/wheeler-yotov-bdm1.png) \begin{gather} \begin{bmatrix} M & B^T \\\\ B & 0 \end{bmatrix} \begin{bmatrix} \mathbf u\\\\ p \end{bmatrix} = \begin{bmatrix} 0 \\\\ f \end{bmatrix} \\\\ M \sim \int \mathbf v \cdot \kappa^{-1} \mathbf u, \quad B \sim \int q \nabla\cdot \mathbf u \end{gather} --- # BDDC for `\(BDM_1\)` .pull-left[ ![](figures/tdycore/zampini-tu-bddc-bdm1-crop.png) Zampini and Tu (2017): robustness to coarsening rate ] .pull-right[ * A priori convergence rate - *not* grid complexity * Rapid coarsening * Solve subdomain problems - almost-Neumann - Dirichlet ![](figures/MandelSousedikBDDCCoarseBasis.png) ] --- # [Flow123d](https://flow123d.github.io/): transport in fractured media .center[<video src="https://flow123d.github.io/gallery/videos/test2d.ogv" type="video/ogg" autoplay="true" loop="true">Flow 123d video</video>] * Jan Brezina and colleagues at TU Liberec * Based on PETSc solvers --- # Multigrid for `\(BDM_1\)` .pull-left[ ## Monolithic methods * Coarsen mixed formulation * Complicated implementation * Overlapping smoother updates * Hard to achieve high performance ] .pull-right[ ## Split methods $$ P^{-1} = \begin{bmatrix} M & B^T \\\\ & \hat S \end{bmatrix}^{-1} $$ With multigrid for $$ \hat S \approx \underbrace{S = B M^{-1} B^T}_{\text{dense!}} $$ SPD, Laplacian-like ] --- # `\(BDM_1\)` spaces .pull-left[![](figures/tdycore/collier-bdm1.png)] .pull-right[ ### Collocated quadrature: block-diagonal `\(M_{WY}\)` $$ \begin{bmatrix} M_{WY} & B^T \\\\ B & 0 \end{bmatrix} $$ ] --- # Wheeler and Yotov (2006) ![:scale 95%](figures/tdycore/wheeler-yotov-stencil.png) * `\(M_{WY}\)` is block diagonal; Schur complement is sparse $$ S = B M^{-1} B^T $$ --- # Wheeler-Yotov stencils ![:scale 100%](figures/tdycore/collier-wy-diagonal.png) --- # Wheeler-Yotov stencils ![:scale 100%](figures/tdycore/collier-wy-jump.png) --- # Wheeler-Yotov solvers ![:scale 100%](figures/tdycore/collier-wy-spe10.png) --- # TDycore assembly ![:scale 100%](figures/tdycore/collier-wy-spe10-assembly.png) --- # Arbogast & Correa (2016) ![:scale 90%](figures/tdycore/arbogast-correa-table5.png) * Arnold-Boffi-Falk (2005): more dofs * Collocated variant unknown, saddle point solvers --- # Should we trade increased solve cost for accuracy? Suppose `\(AC_1^{red}\)` matches accuracy of `\(BDM_1\)` with $$ h = 1.25 h_1 $$ * Half the dofs in 3D * Allow time step `\(1.25\times\)` larger * Tracers and chemistry * May pay off in the end - Currently maintaining both options --- .pull-left[ ![](figures/PETSc/logos/PETSc_RBG.svg) * IMEX integrators * Algebraic solvers and preconditioners * DMPlex for discretization * Powerful run-time composition * GPU- and node-aware communication ] .pull-right[ ![](figures/PETSc/LocalSpaces.svg) * Assemble optimal data structures from single source code ] --- # Outlook * `\(BDM_1\)` needs `\(\sim 10\times\)` more memory than WY * Wheeler-Yotov most friendly for AMG * Saddle-point with BDDC robust, but more expensive * `-tdy_method {wy|bdm|ac}` * Working on vectorized assembly (cf. libCEED) * Python/Julia plugins to come * https://github.com/TDycores-Project/TDycore * TDycore Team: Nathan Collier (ORNL), Gautam Bisht (PNNL), Matt Knepley (UBuffalo), Jenn Frederick and Glenn Hammond (SNL), Satish Karra (LANL) * Thanks to DOE Biological and Environmental Research, SciDAC --- # Latency and throughput are different ![:scale 80%](figures/Kronbichler-fig4-crop.png) [Adapted from Kronbichler et al (2019)](https://doi.org/10.1145/3322813) --- # [Fuhrer at al (2018)](https://www.geosci-model-dev.net/11/1665/2018/gmd-11-1665-2018.pdf) ![:scale 75%](figures/fuhrer2018-fig4.png) * Is high-resolution strong scaling? --- # Fuhrer at al (2018) replot ![:scale 100%](figures/fuhrer2018-scaling-time-ann4.png) --- # New hardware does not reduce latency ![](figures/hardware/epyc-skylake-latency.png) .footnote[From [Anandtech](https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7)] --- # Latency bugs persist ![](figures/hardware/summit-latency-mvapich-spectrum.png) .footnote[From [Khorassani at al (2019)](http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/IWOPH_19_MPI_on_POWER.pdf)] --- # Questions?