class: center, middle, inverse, title-slide # Developing a Terrestrial Dynamical Core for E3SM ### Jed Brown,
jed.brown@colorado.edu
### Multicore-9, 2019-09-26 --- .pull-left-33[  * Water cycle modeling is critical for E3SM * Current subsurface flow is 1D ] .pull-right-66[] * TDycore SciDAC project: extend to global 3D * Composable software --- # TDycore requirements * Solve 3D variably saturated Richards Equation * Implicit thermal-hydrology (includes permafrost) * Portable, extensible, fun to experiment with * Accurate flow on distorted grids * High-contrast/anistropic coefficients - anisotropic tensor coefficients * Very high aspect ratio elements * Time accurate, Quasi-steady/spinup * Data assimilation, param estimation * Verification & Validation --- # Simplified model: Darcy flow ## Strong form: find `\(\mathbf u,p\)` such that `\begin{align} \mathbf u &= -\kappa \nabla p \\ \nabla \cdot \mathbf u &= f \\ \end{align}` subject to pressure `\(p = g_D\)` and/or flux `\(\mathbf u \cdot \mathbf n = g_N\)` boundary conditions. * Mixed finite elements - `\(BDM_1\)`, Wheeler-Yotov, Arbogast-Correa * Multi-point flux finite volume - MPFA-O: Cell-centered with reconstruction --- # Mixed FEM: `\(BDM_1\)`  \begin{gather} \begin{bmatrix} M & B^T \\\\ B & 0 \end{bmatrix} \begin{bmatrix} \mathbf u\\\\ p \end{bmatrix} = \begin{bmatrix} 0 \\\\ f \end{bmatrix} \\\\ M \sim \int \mathbf v \cdot \kappa^{-1} \mathbf u, \quad B \sim \int q \nabla\cdot \mathbf u \end{gather} --- # BDDC for `\(BDM_1\)` .pull-left[  Zampini and Tu (2017): robustness to coarsening rate ] .pull-right[ * A priori convergence rate - *not* grid complexity * Rapid coarsening * Solve subdomain problems - almost-Neumann - Dirichlet  ] --- # [Flow123d](https://flow123d.github.io/): transport in fractured media .center[<video src="https://flow123d.github.io/gallery/videos/test2d.ogv" type="video/ogg" autoplay="true" loop="true">Flow 123d video</video>] * Jan Brezina and colleagues at TU Liberec * Based on PETSc solvers --- # Multigrid for `\(BDM_1\)` .pull-left[ ## Monolithic methods * Coarsen mixed formulation * Complicated implementation * Overlapping smoother updates * Hard to achieve high performance ] .pull-right[ ## Split methods $$ P^{-1} = \begin{bmatrix} M & B^T \\\\ & \hat S \end{bmatrix}^{-1} $$ With multigrid for $$ \hat S \approx \underbrace{S = B M^{-1} B^T}_{\text{dense!}} $$ SPD, Laplacian-like ] --- # `\(BDM_1\)` spaces .pull-left[] .pull-right[ ### Collocated quadrature: block-diagonal `\(M_{WY}\)` $$ \begin{bmatrix} M_{WY} & B^T \\\\ B & 0 \end{bmatrix} $$ ] --- # Wheeler and Yotov (2006)  * `\(M_{WY}\)` is block diagonal; Schur complement is sparse $$ S = B M^{-1} B^T $$ --- # Wheeler-Yotov stencils  --- # Wheeler-Yotov stencils  --- # Wheeler-Yotov solvers  --- # TDycore assembly  --- # Arbogast & Correa (2016)  * Arnold-Boffi-Falk (2005): more dofs * Collocated variant unknown, saddle point solvers --- # Should we trade increased solve cost for accuracy? Suppose `\(AC_1^{red}\)` matches accuracy of `\(BDM_1\)` with $$ h = 1.25 h_1 $$ * Half the dofs in 3D * Allow time step `\(1.25\times\)` larger * Tracers and chemistry * May pay off in the end - Currently maintaining both options --- .pull-left[  * IMEX integrators * Algebraic solvers and preconditioners * DMPlex for discretization * Powerful run-time composition * GPU- and node-aware communication ] .pull-right[  * Assemble optimal data structures from single source code ] --- # Outlook * `\(BDM_1\)` needs `\(\sim 10\times\)` more memory than WY * Wheeler-Yotov most friendly for AMG * Saddle-point with BDDC robust, but more expensive * `-tdy_method {wy|bdm|ac}` * Working on vectorized assembly (cf. libCEED) * Python/Julia plugins to come * https://github.com/TDycores-Project/TDycore * TDycore Team: Nathan Collier (ORNL), Gautam Bisht (PNNL), Matt Knepley (UBuffalo), Jenn Frederick and Glenn Hammond (SNL), Satish Karra (LANL) * Thanks to DOE Biological and Environmental Research, SciDAC --- # Latency and throughput are different  [Adapted from Kronbichler et al (2019)](https://doi.org/10.1145/3322813) --- # [Fuhrer at al (2018)](https://www.geosci-model-dev.net/11/1665/2018/gmd-11-1665-2018.pdf)  * Is high-resolution strong scaling? --- # Fuhrer at al (2018) replot  --- # New hardware does not reduce latency  .footnote[From [Anandtech](https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7)] --- # Latency bugs persist  .footnote[From [Khorassani at al (2019)](http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/IWOPH_19_MPI_on_POWER.pdf)] --- # Questions?