This project has moved. For the latest updates, please go here.

A little benchmark

Dec 31, 2010 at 1:24 PM
Edited Mar 20, 2011 at 11:34 AM

Hello.

I wrote a little benchmark for your iterative solvers. You can get it here: binaries, source, example.

There are basically two modes: PDE and Matlab. The PDE mode solves a Poisson equation with FEM and the Matlab mode solves a linear system defined by some .mat files containing a sparse matrix A, which I created.
In Matlab mode, the computed solution is compared to the Matlab solution x, where Ax = b, with b = (1,1,1, ...). The 2-norm of (solution - x) is displayed.

Now, my first question is:
When I repeat a benchmark without changing any settings, the displayed errors will be different, which means, that the computed solutions are different. Why is that the case? Are the algorithms randomized in some way? Or is it due to the parallelization, which has some unpredictable effects?

And another question:
As I am not familiar with preconditioning: is there any guide, when to use what preconditioner with a specific solver? Or when to use preconditioning at all?
I guess it's for systems, where the iteration would not converge without preconditioning. Looking at the benchmark results, one should probably use the Diagonal preconditioner,because it doesn't seem to affect the overall solving time.

Anyway! I hope you find the benchmark useful. At the moment the solvers and preconditioners are hardcoded into the app, so if you add new ones to your library, the source code must be changed. Maybe using reflection would be a better option (it's on my todo list). I'm looking forward to see how the native wrappers will compare ...

Happy new year,
Chris

Jan 1, 2011 at 8:27 AM

Hi Chris,

That is a pretty nifty benchmark runner!

Now, my first question is:
When I repeat a benchmark without changing any settings, the displayed errors will be different, which means, that the computed solutions are different. Why is that the case? Are the algorithms randomized in some way? Or is it due to the parallelization, which has some unpredictable effects?

Not sure. I belive only the MlkBiCgStab solver uses a random set of initial values. I'll work through the code to figure out why this is occurring (again, another contributor who wrote the code and is no longer with the project).

And another question:
As I am not familiar with preconditioning: is there any guide, when to use what preconditioner with a specific solver? Or when to use preconditioning at all?
I guess it's for systems, where the iteration would not converge without preconditioning. Looking at the benchmark results, one should probably use the Diagonal preconditioner,because it doesn't seem to affect the overall solving time.

We don't have a guide, but we should create one. I'll add it to the todo list.

I'm looking forward to see how the native wrappers will compare ...

The first iteration for the native wrappers will only be for dense matrices. We'll then move on to the sparse matrices. So the native sparse code is a way off.

Happy new year,
Marcus 

Jan 1, 2011 at 8:29 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Jan 2, 2011 at 11:06 AM

Looks there is a problem with how we parallelized the iterative solvers. The serial code is 4-40x times faster - depending on algorithm.

Jan 2, 2011 at 4:33 PM
Edited Jan 2, 2011 at 4:37 PM

Yep, just tried

MathNet.Numerics.Control.DisableParallelization = true;

and it is a lot faster. Even the XXL problems are solved in less than a second (in Matlab mode). I guess I have to build bigger matrices ...

Hope you get it fixed soon!

 

Jan 3, 2011 at 11:57 AM

I think I found the problem. For sparse vector/matrices, we lock the object by default when getting and setting values (since the underlying datastore can be resized during an operation). For methods we didn't override in the sparse vector/matrix and they are parallel, we have multiple threads waiting to obtain a lock (causing a significant bit of overhead). We need to override each parallel method in the sparse vector/matrix classes and optimize this away. Working on that now.

Jan 3, 2011 at 12:00 PM

Chris, if possible could you send me the code to your FEM library (marcus at cuda.net)? I've changed the solver interface a bit and need to rebuild the FEM library for it (to run your benchmark program).

Thanks,
Marcus 

Coordinator
Jan 5, 2011 at 10:59 PM
wo80 wrote:

Now, my first question is:
When I repeat a benchmark without changing any settings, the displayed errors will be different, which means, that the computed solutions are different. Why is that the case? Are the algorithms randomized in some way? Or is it due to the parallelization, which has some unpredictable effects?

If the iterative process has converged then the differences in error should be relatively small because there should only be one correct solution for your matrix equation. However if the convergence is not complete then it is well possible that there are differences in the solution. Make sure the iterative process ended because the residuals got below your required bounds and not because you reached the maximum number of iterations or other reason.

If that still doesn't help then I suspect there may be a bug in the code for the iterative solvers because as Marcus pointed out only the MlkBiCgStab solver grabs some random values for the initial vectors (however for proper convergence even that shouldn't matter, again because there should only be one solution).

Jan 18, 2011 at 12:34 PM
Edited Mar 20, 2011 at 11:35 AM

I had some time to work on the benchmark program. Most of the hardcoded stuff is replaced using reflection and an xml config file. So, as long as there are no changes to the interfaces, the benchmark should work with future versions and new solvers should be available from the benchmark without changing the code. The download is now 13mb due to new .mat files. You can find the links in the first post.

 

pvandervelde wrote:
... However if the convergence is not complete then it is well possible that there are differences in the solution ...

Well, I think an iterative process is still deterministic, so if you don't change the input data, the output should stay the same. When you run the benchmark in non-parallel mode, you get this expected behaviour. The parallelization seems to have some effects on the results, though.

Jan 18, 2011 at 2:20 PM

Still working on the problem. But it seems to be our CommonParallel.Aggregate function that is causing the problem (it is used to compute the dot product).

Jan 19, 2011 at 5:09 PM
Edited Jan 19, 2011 at 5:11 PM

The most recent checkin get us close, but there is still some work to do.  We parallelized a lot code without doing any benchmarking and the some of the parallel code is actually slower - the parallel extensions have a lot of overhead.