Jed Brown, a PETSc developer and an original PISM author, gave this example of comparing which were the efficient solvers for a saved, invertible linear system that came from an SSA solve. This example, which uses 8 MPI processes, mostly shows that no method of these is outstandingly better than another. See also the page on diagnosing and resolving "KSP diverged" errors.
# Additive Schwarz with overlap 1 $ time mpiexec -n 8 ./ex10 -f SSAFD_ksperror.petsc -ksp_type gmres -ksp_norm_type unpreconditioned -ksp_pc_side right -pc_type asm -sub_pc_type lu Number of iterations = 19 Residual norm 92.3905 real 0m3.142s user 0m22.881s sys 0m1.012s # Umfpack instead of PETSc subdomain solves, slightly faster $ time mpiexec -n 8 ./ex10 -f SSAFD_ksperror.petsc -ksp_type gmres -ksp_norm_type unpreconditioned -ksp_pc_side right -pc_type asm -sub_pc_type lu -sub_pc_factor_mat_solver_package umfpack Number of iterations = 19 Residual norm 92.3905 real 0m2.436s user 0m18.369s sys 0m0.728s # More overlap, a little better $ time mpiexec -n 8 ./ex10 -f SSAFD_ksperror.petsc -ksp_type gmres -ksp_norm_type unpreconditioned -ksp_pc_side right -pc_type asm -sub_pc_type lu -sub_pc_factor_mat_solver_package umfpack -pc_asm_overlap 2 Number of iterations = 14 Residual norm 139.529 real 0m2.357s user 0m17.329s sys 0m0.792s # No overlap, slower and less robust $ time mpiexec -n 8 ./ex10 -f SSAFD_ksperror.petsc -ksp_type gmres -ksp_norm_type unpreconditioned -ksp_pc_side right -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_package umfpack Number of iterations = 32 Residual norm 153.677 real 0m2.997s user 0m22.681s sys 0m0.916s # preconditioned norm, less accurate $ time mpiexec -n 8 ./ex10 -f SSAFD_ksperror.petsc -ksp_type gmres -ksp_norm_type preconditioned -ksp_pc_side left -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_package umfpack Number of iterations = 18 Residual norm 30777.5 real 0m2.441s user 0m17.865s sys 0m0.840s