# FEM Assembly and Solver Benchmarks

It is both interesting and useful to compare simulation tools and the performance of different implementations. In the following we have performed three basic tests relevant to FEM simulation codes, sparse matrix vector multiplication, finite element matrix assembly, and solving the Poisson equation on a unit square, with the following five different Finite Element solver implementations:

- FEATool Multiphysics - the FEATool toolbox written in Matlab / Octave m-script code
- A Fortran 77 Finite Element library
- Julia - Julia is a very new programming language that aims to achieve close to the performance of C while at the same time being very easy to program and use (maybe as if Matlab was reimplemented from scratch today)
- FEniCS Project - a computing platform for solving partial differential equations (PDEs) with high-level Python and C++ interfaces
- SFEA (Stencil based Finite Element Analysis) - is an experimental high-performance stencil based FEM solver written in Fortran 90 which comes very close to the memory bandwidth limit so gives an indication of the upper performance limit

All tests were performed on a single core (serial mode) of a Desktop system with a Intel Core i7-3820 CPU running Linux, and the Fortran codes were compiled with the Intel Fortran compiler.

## Test 1 - Sparse Matrix Vector Product

1/h | FEAT2D | FEATool | SFEA | |
---|---|---|---|---|

Fortran | Matlab/Octave | Julia | Fortran | |

128 | 0 | 0.002 | 0.03 | 0 |

256 | 0.001 | 0.002 | 0.031 | 0 |

512 | 0.005 | 0.006 | 0.034 | 0.001 |

1024 | 0.024 | 0.021 | 0.05 | 0.002 |

2048 | 0.197 | 0.085 | 0.08 | 0.009 |

4096 | 0.411 | 0.25 | 0.034 |

For parse benchmark test we can see that FEAT2D (CSR sparse format), FEATool (Matlab sparse format), and Julia (CSC sparse format) perform similarly especially for larger grid sizes, while the stencil based SFEA approach is about a magnitude faster. FEniCS does not seem to support matrix and vector operations on a high level so no data is included for FEniCS here.

## Test 2 - Finite Element Matrix Assembly

1/h | FEAT2D | FEATool | FEniCS | |
---|---|---|---|---|

Fortran | Matlab/Octave | Python/C++ | Julia | |

128 | 0.05 | 0.05 | 0.05 | 0.24 |

256 | 0.13 | 0.14 | 0.12 | 0.57 |

512 | 0.42 | 0.43 | 0.31 | 1.7 |

1024 | 1.5 | 1.7 | 1.1 | 6.5 |

2048 | 6 | 7 | 5 | 26 |

4096 | 24 | 105 |

Here surprisingly, the vectorized and optimized FEATool Matlab code is actually just as fast as the FEAT2D Fortran code (which is a very good end efficient reference implementation). FEniCS is a little bit faster but this is to be expected since FEAT2D and FEATool both use quadrilateral shape functions which are more expensive to assemble than the linear triangular ones used by FEniCS and Julia. The performance of Julia is unfortunately not very good here but this could be due to a non-optimized implementation. The SFEA code is not included since being stencil based FEM assembly costs virtually nothing.

## Test 3 - Linear Solver for the Poisson Equation

1/h | FEAT2D | FEATool | FEniCS | SFEA | |
---|---|---|---|---|---|

Fortran GMG | Matlab/Octave UMFPACK | Python/C++ PETSc | Julia CHOLMOD? | Fortran GMG | |

128 | 0.025 | 0.368 | 0.19 | 0.18 | 0.049 |

256 | 0.05 | 1.718 | 0.79 | 0.322 | 0.064 |

512 | 0.21 | 7.406 | 4.65 | 0.9 | 0.11 |

1024 | 1.1 | 47.646 | 34.1 | 3.4 | 0.16 |

2048 | 6 | 338.14 | - | 14.6 | 0.49 |

4096 | 63 | 83 | 2.9 | ||

8192 | 20 |

In the final test the Poisson equation is solved on the unit square with unit source term and zero homogeneous Dirichlet boundary conditions everywhere. The default linear solver of each code is used throughout the tests. FEAT2D and SFEA employ a geometric multigrid solver, FEATool (Matlab/Octave) uses UMFPACK, and FEniCS uses PETSc. From the timings we can see that the UMFPACK and PETSc direct sparse solvers have about the same performance, with a slight advantage for PETSc (although failed for the 1/h=2048 grid). As the problem sizes increases we can see that the GMG solvers scale significantly better than the direct solvers, with the stencil based SFEA approach being about a magnitude faster (about 700 times faster than UMFPACK on the 1/h=2048 grid with 4.2 million unknowns). It is not quite clear which solver Julia uses but due to the performance figures which are on par with the FEAT2D GMG solver we suspect it detects that the problem can be solved faster with Cholesky factorization and uses a corresponding solver.

## Summary

We have looked at how five different finite element implementations perform on three important test cases. For sparse matrix vector operations the codes performed very similar with exception of the stencil based approach with was a magnitude faster.

For matrix assembly it was quite surprising just how good performance one can get from a properly vectorized and optimized Matlab implementation, showing the same performance as a fast Fortran implementation.

Regarding the solvers GMG unsurprisingly beat direct solvers when grid sizes increased. Here also the stencil based approach was faster by a magnitude or more.

To conclude one can say that it is entirely possible to write high performant FEM simulation codes in many programming languages. What seems more important for performance is the choice of data structures (sparse vs stencil) and algorithm (direct vs iterative solvers). The out-lier might be Julia for which it currently isn’t fully clear how it will perform, but due to being a very new language it certainly shows a lot of potential.