This project has moved and is read-only. For the latest updates, please go here.

Performance considerations

Mar 18, 2011 at 8:50 AM

Hi all,

I've discovered Math.NET Numerics recently. It's a very interesting library, congratulations for the fantastic job.

I have some remarks concerning performance (this topic) and API design (that I'll detail in a separate topic).

I tried to benchmark Math.NET Numerics compared to other free numerical libraries on typical operations (matrix multiplication and inversion, eigen values computation and read access to elements of a matrix) : ILNumerics and MetaNumerics. Math.NET Numerics as strong arguments compared to these 2 libs (ILNumerics has an horrible API, and MetaNumerics has a low number of features). However, on a performance point of view, Math.NET Numerics is behind them.

ILNumerics is much faster (up to 2 orders of magnitude !!) than Math.NET Numerics, but I assume this is because ILNumerics uses the native MKL library. I hope that when Math.NET Numerics will have a native version it will be as fast as ILNumerics.

MetaNumerics is a pure managed library, as the beta 1 of Math.NET Numerics. However, it is 2 to 4 times faster than Math.NET Numerics ! In particular, when reading the elements of a Matrix with MetaNumerics there has only a slight overhead compared to reading the elements of a C# rectangular array. Reading the elements of a Matrix using Math.NET Numerics is 5 times slower (and "only" 2 times slower without range checking). I also downloaded the latest version from the source code repository and compiled it : I can see using the Windows Program Manager that now some Math.NET Numerics operations are multithreaded on the 2 cores of my CPU. However, the results of the benchmark are strangely the same (??).

I hope futur version of Math.NET Numerics will show progress on these performance points.

Again, congratulations for the fantastic job.

Thomas

Mar 18, 2011 at 10:25 AM

Hi Thomas,

Thank you for your feedback.

>I hope that when Math.NET Numerics will have a native version

The MKL provider is complete, but due to the MKL license we cannot distribute it. User's will have to build it on their own. We should have a GotoBLAS2 provider in a couple weeks that we can distribute.

> However, it is 2 to 4 times faster than Math.NET Numerics 

That is very interesting. Would you mind posting your benchmark code? For accessing matrix elements in MetaNumerics, are you using the Matrix indexer or the GetEntry static method?

Thanks,

Marcus

Mar 18, 2011 at 1:27 PM
I don;t think MKL license is that strict. ILNumerics is actually distributing a dll made by MKL, (are they doing that illegally?)

 

cuda wrote:

> The MKL provider is complete, but due to the MKL license we cannot distribute it. User's will have to build it on their own. We should have a GotoBLAS2 provider in a couple weeks that we can distribute.


Mar 18, 2011 at 1:39 PM
Edited Mar 18, 2011 at 1:40 PM

The relevant part of the MKL EULA:

You may NOT ... distribute Redistributables except as part of a larger program that adds significant primary functionality different from that of the Redistributables;

Our problem is that we've made MathNet.Numerics.Algorithms.LinearAlgebra.Mkl public so it can be used in other projects. It is a very thin wrapper around MKL and we might violate the MKL EULA if we distributed the MKL redistributables with it (I think it would). If we made it private like we did for dnAnalytics there would be no issue (I believe).

Mar 18, 2011 at 2:50 PM

Hi Marcus,

Here is the benchmark code I use. It's pretty simple and maybe the results are biased because it's always difficult to write good benchmark codes (and I'm pretty sure this one is not a good one).

For Math.NET Numerics:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using MathNet.Numerics.LinearAlgebra.Double;
using MathNet.Numerics.LinearAlgebra.Double.Factorization;
using MathNet.Numerics.LinearAlgebra.Generic;

namespace MathNetNumerics
{
    class Program
    {
        static void Main(string[] args)
        {
            TestInv33();

            Stopwatch watch = new Stopwatch();

            DenseMatrix mat1 = CreateMatrix(10);
            DenseMatrix mat2 = CreateMatrix(10);

            watch.Start();
            Matrix<double> mat3 = TestMultiply(mat1, mat2);
            watch.Stop();
            Console.WriteLine("TestMultiply: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            mat3 = TestInv(mat1);
            watch.Stop();
            Console.WriteLine("TestInv: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            double res = TestEigen(mat1);
            watch.Stop();
            Console.WriteLine("TestEigen: " + watch.ElapsedMilliseconds + "ms (" + (10000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            res = TestReadNoRangeCheck(Matrix (mat1);
            watch.Stop();
            Console.WriteLine("TestReadNoRangeCheck(Matrix : " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            res = TestReadRangeCheck(Matrix (mat1);
            watch.Stop();
            Console.WriteLine("TestReadRangeCheck(Matrix : " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            Console.ReadLine();
        }

        static private DenseMatrix CreateMatrix(int size)
        {
            DenseMatrix mat = DenseMatrix.Identity(size);
            DenseMatrix mat = new DenseMatrix(size, size);
            Random rand = new Random();
            for (int i = 0; i < size; i++)
            {
                for (int j = 0; j < size; j++)
                {
                    mat[i, j] = (rand.NextDouble() >= 0.5) ? 1.0 : 0.0;
                }
            }
            return mat;
        }

        static private Matrix<double> TestMultiply(DenseMatrix mat1, DenseMatrix mat2)
        {
            Matrix<double> mat3 = mat1;

            for (int i = 0; i < 1000000; i++)
            {
                mat3 = mat1 * mat2;
            }
            return mat3;
        }

        static private Matrix<double> TestInv(Matrix mat1)
        {
            Matrix<double> mat3 = mat1;
            for (int i = 0; i < 1000000; i++)
            {
                mat3 = mat1.Inverse();
            }
            return mat3;
        }

        static private void TestInv33()
        {
            Matrix<double> mat = new DenseMatrix(3);
            mat[0, 0] = 0; mat[0, 1] = 1; mat[0, 2] = 0;
            mat[1, 0] = 0; mat[1, 1] = 0; mat[1, 2] = 1;
            mat[2, 0] = 1; mat[2, 1] = 0; mat[2, 2] = 0;
            Matrix<double> mat2 = null;
            Stopwatch watch = new Stopwatch();
            watch.Start();
            for (int i = 0; i < 10000; i++)
            {
                mat2 = mat.Inverse();
            }
            watch.Stop();
            Console.WriteLine("TestInv33: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");
        }

        static private double TestEigen(DenseMatrix mat1)
        {
            double res = 0.0;
            for (int i = 0; i < 10000; i++)
            {
                Evd eigen = new DenseEvd(mat1);
                Vector<System.Numerics.Complex> val = eigen.EigenValues();
                res += val[0].Real;
            }
            return res;
        }

        static private double TestReadNoRangeCheck(Matrix mat1)
        {
            double res = 0.0;
            int dim = mat1.ColumnCount;
            for (int k = 0; k < 1000000; k++)
            {
                for (int i = 0; i < dim; i++)
                {
                    for (int j = 0; j < dim; j++)
                    {
                        res += mat1.At(i, j); // At == no range checking
                    }
                }
            }
            return res;
        }

        static private double TestReadRangeCheck(Matrix mat1)
        {
            double res = 0.0;
            int dim = mat1.ColumnCount;
            for (int k = 0; k < 1000000; k++)
            {
                for (int i = 0; i < dim; i++)
                {
                    for (int j = 0; j < dim; j++)
                    {
                        res += mat1[i, j];
                    }
                }
            }
            return res;
        }
    }
}

 

For MetaNumerics:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Meta.Numerics.Matrices;
using System.Diagnostics;
using Meta.Numerics;

namespace MetaNumerics
{
    class Program
    {
        static void Main(string[] args)
        {
            TestInv33();

            Stopwatch watch = new Stopwatch();

            SquareMatrix mat1 = CreateMatrix(10);
            SquareMatrix mat2 = CreateMatrix(10);

            watch.Start();
            SquareMatrix mat3 = TestMultiply(mat1, mat2);
            watch.Stop();
            Console.WriteLine("TestMultiply: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            mat3 = TestInv(mat1);
            watch.Stop();
            Console.WriteLine("TestInv: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            double res = TestEigen(mat1);
            watch.Stop();
            Console.WriteLine("TestEigen: " + watch.ElapsedMilliseconds + "ms (" + (10000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            watch.Restart();
            res = TestRead(mat1);
            watch.Stop();
            Console.WriteLine("TestRead: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");

            Console.ReadLine();
        }

        static private SquareMatrix CreateMatrix(int size)
        {
            SquareMatrix mat = new SquareMatrix(size);
            Random rand = new Random();
            for (int i = 0; i < size; i++)
            {
                for (int j = 0; j < size; j++)
                {
                    mat[i, j] = (rand.NextDouble() >= 0.5) ? 1.0 : 0.0;
                }
            }
            return mat;
        }

        static private SquareMatrix TestMultiply(SquareMatrix mat1, SquareMatrix mat2)
        {
            SquareMatrix mat3 = mat1;

            for (int i = 0; i < 1000000; i++)
            {
                mat3 = mat1 * mat2;
            }
            return mat3;
        }

        static private SquareMatrix TestInv(SquareMatrix mat1)
        {
            SquareMatrix mat3 = mat1;
            for (int i = 0; i < 1000000; i++)
            {
                mat3 = mat1.Inverse();
            }
            return mat3;
        }

        static private void TestInv33()
        {
            SquareMatrix mat = new SquareMatrix(3);
            mat[0, 0] = 0; mat[0, 1] = 1; mat[0, 2] = 0;
            mat[1, 0] = 0; mat[1, 1] = 0; mat[1, 2] = 1;
            mat[2, 0] = 1; mat[2, 1] = 0; mat[2, 2] = 0;
            SquareMatrix mat2 = null;
            Stopwatch watch = new Stopwatch();
            watch.Start();
            for (int i = 0; i < 10000; i++)
            {
                mat2 = mat.Inverse();
            }
            watch.Stop();
            Console.WriteLine("TestInv33: " + watch.ElapsedMilliseconds + "ms (" + (1000000 * 1000.0 / watch.ElapsedMilliseconds) + " /s)");
        }

        static private double TestEigen(SquareMatrix mat1)
        {
            double res = 0.0;
            for (int i = 0; i < 10000; i++)
            {
                Complex[] val = mat1.Eigenvalues();
                res += val[0].Re;
            }
            return res;
        }

        static private double TestRead(SquareMatrix mat1)
        {
            double res = 0.0;
            int dim = mat1.Dimension;
            for (int k = 0; k < 1000000; k++)
            {
                for (int i = 0; i < dim; i++)
                {
                    for (int j = 0; j < dim; j++)
                    {
                        res += mat1[j,i];
                    }
                }
            }
            return res;
        }
    }
}

Mar 18, 2011 at 2:55 PM
cuda wrote:

The MKL provider is complete, but due to the MKL license we cannot distribute it. User's will have to build it on their own. We should have a GotoBLAS2 provider in a couple weeks that we can distribute.

Would it be possible to use the mkl_custom.dll provided with ILNumerics with Math.NET Numerics ?

Mar 18, 2011 at 2:59 PM

>Would it be possible to use the mkl_custom.dll provided with ILNumerics with Math.NET Numerics ?

No. We aren't using the MKL DLL builder but creating our own DLL -  we want all the native providers to have the same C interface.

Thanks for the benchmark code!

Mar 18, 2011 at 4:34 PM

One thing I noticed is that you are testing with small matrices. The library is threaded by default. The parallelization has some overhead and that becomes significant when using small matrices in tight loops. In those cases you should disable parallelization: MathNet.Numerics.Control.DisableParallelization = true;

Running with with parallelization on, I get:

TestInv33: 23ms (43478260.8695652 /s)
TestMultiply: 10459ms (95611.4351276413 /s)
TestInv: 86310ms (11586.1429730043 /s)
TestEigen: 2315ms (4319.65442764579 /s)
TestReadNoRangeCheck(Matrix : 547ms (1828153.56489945 /s)
TestReadRangeCheck(Matrix : 1410ms (709219.858156028 /s)

With parallelization off:

TestInv33: 21ms (47619047.6190476 /s)
TestMultiply: 3446ms (290191.526407429 /s)
TestInv: 8698ms (114968.958381237 /s)
TestEigen: 684ms (14619.8830409357 /s)
TestReadNoRangeCheck(Matrix : 553ms (1808318.26401447 /s)
TestReadRangeCheck(Matrix : 1443ms (693000.693000693 /s)

Much better. I'll look into way matrix element access is slower than MetaNumercs - it should be roughly the same.

Mar 18, 2011 at 7:22 PM

Thank you for the tip, the differences are huge indeed.

I've done tests on small matrices because most of my computations are done on small matrices (most of the time 3x3 matrices). I don't know what matrix sizes typically use the other users. If people also use most of the time small matrices, it would be "safer" to disable the parallelization by default.

For the read test, I've done it using different kinds of matrices/arrays:

* C# jagged array [][] => ~700ms

* C# rectangular array [,] => ~900ms  (it's normal, jagged array are known to be better optimized by the JIT compiler than rectangular array)

* MetaNumerics Matrix [,] => ~1200ms

* Math.NET Numerics Matrix without range checking (Matrix.At()) => ~2100ms

* Math.NET Numerics Matrix with range checking [,] => ~5000ms

Mar 19, 2011 at 12:26 AM
Zlika wrote:

I've done tests on small matrices because most of my computations are done on small matrices (most of the time 3x3 matrices). I don't know what matrix sizes typically use the other users. If people also use most of the time small matrices, it would be "safer" to disable the parallelization by default.

Hi Tomas,

Just a quick note (and I'm not sure if this is applicable for you); if your matrices are 3x3 or 4x4 (say because you are doing graphics programming) it is probably better to use a library that has been optimized for these matrices, many of which are available. They essentially inline the "loops" you need for matrix multiplication etc and generally offer better performance.

Cheers, Jurgen

Mar 22, 2011 at 9:39 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Mar 22, 2011 at 6:59 PM

Hi Thomas,

I created a small benchmark and my results weren't as bad as yours, but we are significantly slower. I think the problem is that we have a virtual method calling another virtual method using a virtual property. If I remove the virtual property and the second virtual method call, we get about the same performance as MetaNumerics' Matrix and slightly slower than their SquareMatrix. I'll check in the fix in tomorrow.

Regards,
Marcus 

Mar 23, 2011 at 1:59 PM
Edited Mar 23, 2011 at 2:00 PM

Checked in the change.

Using 3x3 matrices (the difference was less pronounced in larger matrices), before:
MetaNumerics: 444ms
Math.NET Indexer: 1376ms
Math.NET At: 536ms

after:
MetaNumerics: 427ms
Math.NET Indexer: 437ms
Math.NET At: 375ms 

Mar 23, 2011 at 7:07 PM

Great job Marcus !

Thank you.

Mar 24, 2011 at 8:25 PM

Indeed, thanks for looking into it, and Thomas for reporting!

Chris

Dec 12, 2011 at 9:37 AM

If I may I would like to add something to this thread about the performance consideration.

In my project I multiply two rather big matrices. Their dimensions are for example 10000 by 3 and 3 by 10000. If I use MathNet.Numerics.LinearAlgebra.Double.DenseMatrix this multiplication takes about 17 seconds on my (quite old) machine. It's deathly slow comparing to the library that I've used so far which is DotNumerics, where similar multiplication takes 0,04 seconds! Its almost 500x faster! And as I look inside the DotNumerics code, the matrix multiplication is there done in the managed code.

What's wrong with the Math.NET?

Dec 12, 2011 at 10:59 AM
piast9 wrote:

Its almost 500x faster!

Try again with Control.DisablePrallelization = true

Dec 12, 2011 at 8:20 PM
Edited Dec 13, 2011 at 9:12 AM

Not a much difference, about 15s instead of 17s but still hundreds times slower than DotNumerics.

I've looked a bit inside the code of the library and it seems that there is something wrong with the Cache-oblivious algorithm of matrix multiplication. DotNumerics which I used so far multiplies the matrices (which it stores in the vector - just as DenseMatrix) in the simplest possible way, just as the matrix multiplication is defined. Here is the essential part of this multiplication from that library:

 

            for (int j = 0; j < BColumns; j++)
            {
                indexBJ = j * BRows;
                indexAJ = j * ARows;
                for (int i = 0; i < ARows; i++)
                {
                    Sum = 0.0;
                    for (int k = 0; k < AColumns; k++)
                    {
                        Sum += AData[i + k * ARows] * BData[k + indexBJ];
                    }
                    CData[i + indexAJ] = Sum;
                }
            } 

 

I don't exactly follow the algorithm used in the MathNET but it works way, way slower. It's not the matter of my quite ancient laptop at which I work (with the Pentium M processor). I've tested it on the contemporary Phenom64, with disabled parallelization in the library options. The 10000x3 and 3x10000 matrices multiplication takes 2,3 seconds... Still way longer than simple multiplication presented above.

EDIT:

Got it! The ParallelizeOrder default value is set way too small. I set it to 1000 and multiplication was done in the blink of the eye. Seems that the cache-oblivious algorithm was spending most of its time splitting the task into smaller parts than multiplying the matrices.

EDIT2:

Maybe it has something to do with the fact, that the matrices that I multiply are very "long and thin". During the cache-oblivious multiplication the ParallelizeOrder is compared to the sum of the matrix dimensions so it fools the algorithm...