This project has moved. For the latest updates, please go here.

Matrix multiplication slow?

Feb 20, 2010 at 12:32 AM
Edited Feb 20, 2010 at 12:51 AM

Hi everyone,

Using Math.NET for the first time, experimenting using it for Project Euler. Used Math.NET's matrix library to solve problem (which I recommend to problem solving enthusiasts, it's fun):

http://projecteuler.net/index.php?section=problems&id=213

Anyways, I built a 900x900 matrix and exponentiated it by 50, with the following code

                using MathNet.Numerics.LinearAlgebra.Double;               

DenseMatrix transition = new DenseMatrix(900,900);
// build up values of matrix ... DenseMatrix finalstate = transition; for (int i = steps; i > 1; i--)
{
Console.WriteLine("multiplying step " + (steps - i + 2));
finalstate = (DenseMatrix)finalstate.Multiply(transition);
}


What I found was that each multiplication operation takes about 10 seconds on my machine, so in totality it takes about 500 seconds to raise a 900x900 matrix to the power of 50. This seems *really* slow, as other problem solvers using the same method but in other languages have the whole thing running in under 10 seconds. Stepping through in the debugger shows that the code slows down at this line

                    finalstate = (DenseMatrix)finalstate.Multiply(transition);

Any idea what's going on here? A 900x900 matrix doesn't seem that big that it should take 10 seconds to multiply by itself.

Coordinator
Feb 20, 2010 at 4:32 PM

I've been looking into this a bit. Unfortunately, I don't think with a purely managed codebase we can get much more speedup than this. This links points to an interesting blog post about managed performance for matrix multiplication. Let me play with it a bit more and see whether we can squeeze out a bit of time.

Also, Math.Net is architected so it can incorporate optimized native libraries. Marcus has been working on those. The Intel MKL library bindings are working but for the free ATLAS libraries I believe the status is that we are having some compilation problems. Perhaps Marcus can comment?

Cheers, Jurgen

 

Feb 20, 2010 at 6:13 PM

>but for the free ATLAS libraries I believe the status is that we are having some compilation problems. Perhaps Marcus can comment?

I cannot get ATLAS to build on Windows 7 x64.  I'll setup an XP x86 virtual machine this week and see if it builds there.

Regards,

Marcus

 

Feb 20, 2010 at 11:41 PM

Thanks for the replies. Very interesting articles. I experimented a bit and Math.NET's implementation is much faster than the naive implementation in the article. It also seems Math.NET uses all available cores. I also found this interesting video from microsoft that shows four approaches to matrix multiplication:

http://channel9.msdn.com/posts/DanielMoth/VS2010-Parallel-Computing-Features-Tour/

I tried implementing these and running them against Math.NET's implementation, but I got stuck on not being able to load System.Threading.Tasks. If I can unblock myself and get those to work I'll compare them against Math.NET for speed.

 

Feb 21, 2010 at 5:42 AM

> I experimented a bit and Math.NET's implementation is much faster than the naive implementation in the article. It also seems Math.NET uses all available cores.

I wouldn't put too much weight into their native implementation - it doesn't even use SIMD. It is more of if we used roughly the same algorithm, which is faster. 

You mentioned that Math.NET was using all cores.  Managed dnA was taking 11 seconds for 1000x1000 matrix several years ago on a single core CPU. So there might be a problem with our current implementation. What CPU are you using?

One more thought. When multiplying a matrix by itself, we need to use temporary matrix. We then copy the orginal matrix to it. You might want to try using two matrices and then use the Multiply(DenseMatrix other, DenseMatrix result) method.  In each loop you'd alternate between the two as the result matrix. This would eliminate the extra memory allocation and copying.

Thanks,

Marcus

Feb 22, 2010 at 5:52 AM

My processor:

Processor :    AMD Phenom II X4 955 @ 3200 MHz

So it's a quad core fairly modern cpu. Took about 9 seconds to multiply a 900x900 matrix. VS2010's performance wizard shows all 4 cores are being used.

The naive implementation takes 30 seconds, and shows only 1 core being used.

Feb 22, 2010 at 9:50 AM

Ok here's the runtimes I got. Each of these runtimes is calculated from the same 900x900 matrix

00:00:11.73 RunTime for Math.NET's Densematrix mult
00:00:33.36 RunTime for matrix mult naive
00:00:26.76 RunTime for matrix mult temporary variable
00:00:09.81 RunTime for matrix mult using Prallel.For()
00:00:07.35 RunTime for matrix mult temporary variable using Prallel.For()

The last implementation, a slight modification of microsoft's example, is 37% faster than Math.NET's implementation for a 900x900 array. Both use all 4 cores. Here is the code

 

        static void MultiplyMatricesTempVariableParallelFor(int size, double[,] m1, double[,] m2, double[,] result)
        {
            Parallel.For(0, size, i =>
            {
                for (int j = 0; j < size; j++)
                {
                    double tmp = 0;
                    for (int k = 0; k < size; k++)
                    {
                        tmp += m1[i, k] * m2[k, j];
                    }
                    result[i, j] = tmp;
                }
            });
        }          
Perhaps you could try testing this and confirm that is it faster?

Feb 22, 2010 at 10:24 AM

I just ran a quick test against the old dnA code. I got 5.24 seconds for the dnA single threaded implementation and 6.4 seconds for the Numerics parallel implementation on a dual core cpu.

For your implementation (changing it to use 1-D arrays), I got 2.5 seconds on a dual core cpu.

So there is definitely a problem with Numerics. Thanks for pointing this out!!! My guess is the wrong branch of our pseudo GEMM function is being called - likely using the full version. I'll take a look at it tomorrow.

Also, our timings are way different. Are you making sure the methods are jitted before running your benchmark (calling them once before you start your timing)? 

I'll have some spare time tomorrow to add the MKL implementation and post the results here. 

 

Feb 23, 2010 at 4:05 AM

I tested each implementation 5 times to make sure they are jitted. Here's the results, when Math.NET is called before my implementation

Multiply size 900 Matrices with Math.NET: 10.7266136 seconds
Multiply size 900 Matrices with Parallel.For(): 7.464427 seconds
Multiply size 900 Matrices with Math.NET: 8.9555122 seconds
Multiply size 900 Matrices with Parallel.For(): 7.3024177 seconds
Multiply size 900 Matrices with Math.NET: 11.0796337 seconds
Multiply size 900 Matrices with Parallel.For(): 7.273416 seconds
Multiply size 900 Matrices with Math.NET: 10.8146186 seconds
Multiply size 900 Matrices with Parallel.For(): 7.256415 seconds
Multiply size 900 Matrices with Math.NET: 10.7576153 seconds
Multiply size 900 Matrices with Parallel.For(): 7.4554264 seconds

Interestingly, if I switch the order the two are tested, Math.NET's performance gets a lot better, and mine stays the same. Perhaps due to caching of multiplication results? Here's what happens if Math.NET goes second

Multiply size 900 Matrices with Parallel.For(): 7.3854224 seconds
Multiply size 900 Matrices with Math.NET: 7.6314365 seconds
Multiply size 900 Matrices with Parallel.For(): 7.5794336 seconds
Multiply size 900 Matrices with Math.NET: 7.6014348 seconds
Multiply size 900 Matrices with Parallel.For(): 7.1784106 seconds
Multiply size 900 Matrices with Math.NET: 7.2824165 seconds
Multiply size 900 Matrices with Parallel.For(): 7.1604096 seconds
Multiply size 900 Matrices with Math.NET: 7.379422 seconds
Multiply size 900 Matrices with Parallel.For(): 7.3484203 seconds
Multiply size 900 Matrices with Math.NET: 7.3714217 seconds

In either case, nowhere near the 2.5 seconds you get, Marcus. I've included my complete source code. I'd be curious to see if you still get 2.5 seconds on your computer when you run my framework

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MathNet.Numerics;
using MathNet.Numerics.LinearAlgebra.Double;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

namespace ProjectEuler
{
    class Program
    {
        public delegate void MatrixMultDelegate();
        public static int size;
        public static double[,] m1;
        public static double[,] m2;
        public static double[,] m3;
        public static DenseMatrix DM1;
        public static DenseMatrix DM2;
        public static DenseMatrix DM3;
  
        static void Main(string[] args)
        {
             testMatrix();
        }

        static void testMatrix()
        {
            size = 900;
            m1 = new double[size, size];
            m2 = new double[size, size];
            m3 = new double[size, size];
            
            CreateMatrix(size,m1,m2);

            DM1 = new DenseMatrix(m1);
            DM2 = new DenseMatrix(m2);
            DM3 = new DenseMatrix(m3);

            for (int i = 0; i < 5; i++){
                RunAndMeasureTime(new MatrixMultDelegate(Mult2), "Multiply size " + size + " Matrices with Math.NET: ");
                RunAndMeasureTime(new MatrixMultDelegate(Mult1), "Multiply size " + size + " Matrices with Parallel.For(): ");
            }
 
            Console.ReadLine();

        }
    
        private static void CreateMatrix(int size, double[,] m1, double[,] m2)
        {
            Random autoRand = new Random();

            for (int i = 0; i < size; i++)
            {
                for (int j = 0; j < size; j++)
                {
                    double RandomNumber = autoRand.NextDouble();
                    m1[i, j] = RandomNumber;
                    m2[i, j] = RandomNumber;
                }
            }
        }

        private static void RunAndMeasureTime(MatrixMultDelegate methodToRun, string outputText)
        {
            DateTime startTime = DateTime.Now;
            methodToRun();
            TimeSpan duration = DateTime.Now - startTime;
            Console.WriteLine(outputText + duration.TotalSeconds + " seconds");
        }

        private static void Mult1()
        {
            MultiplyMatricesTempVariableParallelFor(size, m1, m2, m3);
        }

        private static void Mult2()
        {
            DM1.Multiply(DM2, DM3);
        }


        private static void MultiplyMatricesTempVariableParallelFor(int size, double[,] m1, double[,] m2, double[,] result)
        {
            Parallel.For(0, size, i =>
            {
                for (int j = 0; j < size; j++)
                {
                    double tmp = 0;
                    for (int k = 0; k < size; k++)
                    {
                        tmp += m1[i, k] * m2[k, j];
                    }
                    result[i, j] = tmp;
                }
            });
        }          
    }
}

 

 

 

Feb 23, 2010 at 8:26 AM

OK. I ran your code and got:

Multiply size 900 Matrices with Math.NET: 4.757 seconds
Multiply size 900 Matrices with Parallel.For(): 4.359 seconds

Changing MultiplyMatricesTempVariableParallelFor to use 1-D arrays I get:

Multiply size 900 Matrices with Parallel.For(): 1.119 seconds


Multi-dimensional arrays are terribly slow. 

Very interesting is that using .NET 4.0 is twice as fast for MultiplyMatricesTempVariableParallelFor over .NET 2.0 (2.5 second I got yesterday).

More testing needs to be done to find out if it is the the compiler, the runtime, or the .NET 4.0 implemneation of Parallel.For.
I'd guess its the Parallel.For implementation. If it is, we need to improve ours or provide a separate
.NET 4.0 build of numerics. 

 

 

Feb 23, 2010 at 12:09 PM

Ran a couple more tests.

My 1-D tests were using row-major instead of column-major (how our dense matrices are laid out). Re-running the tests using column-major shows that .NET 2.0 w/ our Parallel.For and .NET 4.0 perform about the same - 1.8 second. 

Not sure why the discrepancy exists between .NET 2.0 and .NET 4.0 for row-major. Perhaps there are new optimizations for multi-dimensional arrays (which are essential row-major 1-D arrays in memory) for .NET 4.0. Anyone have any idea why? It's probably not our implementation of Parallel.For.

 

Feb 23, 2010 at 5:00 PM

Interesting that your run times are so much faster than mine. I'm suprised as I thought I had a pretty powerful system (I built it just recently). Could you share with me your code for 1-D arrays, I'd like to try that.

Feb 23, 2010 at 6:24 PM

 

>I'm suprised as I thought I had a pretty powerful system (I built it just recently)
Your CPU should be significantly faster than mine. There must be something else going on.

Here is the updated code (column-major):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MathNet.Numerics;
using MathNet.Numerics.LinearAlgebra.Double;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

namespace ProjectEuler
{
    class Program
    {
        public delegate void MatrixMultDelegate();
        public static int size;
        public static double[] m1;
        public static double[] m2;
        public static double[] m3;
        public static DenseMatrix DM1;
        public static DenseMatrix DM2;
        public static DenseMatrix DM3;

        static void Main(string[] args)
        {
            testMatrix();
        }

        static void testMatrix()
        {
            size = 900;
            m1 = new double[size* size];
            m2 = new double[size* size];
            m3 = new double[size* size];

            DM1 = new DenseMatrix(size);
            DM2 = new DenseMatrix(size);
            DM3 = new DenseMatrix(size);

            CreateMatrix(size, m1, m2, DM1, DM2) ;

            for (int i = 0; i <2; i++)
            {
                RunAndMeasureTime(new MatrixMultDelegate(Mult2), "Multiply size " + size + " Matrices with Math.NET: ");
                RunAndMeasureTime(new MatrixMultDelegate(Mult1), "Multiply size " + size + " Matrices with Parallel.For(): ");
            }

            Console.ReadLine();
        }

        private static void CreateMatrix(int size, double[] m1, double[] m2, DenseMatrix dm1, DenseMatrix dm2)
        {
            Random autoRand = new Random(0);

            for (int i = 0; i < size; i++)
            {
                for (int j = 0; j < size; j++)
                {
                    double RandomNumber = autoRand.NextDouble();
                    m1[j * size + i] = RandomNumber;
                    m2[j * size + i] = RandomNumber;
                    dm1[i, j] = RandomNumber;
                    dm2[i, j] = RandomNumber;
                }
            }
        }

        private static void RunAndMeasureTime(MatrixMultDelegate methodToRun, string outputText)
        {
            DateTime startTime = DateTime.Now;
            methodToRun();
            TimeSpan duration = DateTime.Now - startTime;
            Console.WriteLine(outputText + duration.TotalSeconds + " seconds");
        }

        private static void Mult1()
        {
            MultiplyMatricesTempVariableParallelFor(size, m1, m2, m3);
        }

        private static void Mult2()
        {
            DM1.Multiply(DM2, DM3);
        }


        private static void MultiplyMatricesTempVariableParallelFor(int size, double[] m1, double[] m2, double[] result)
        {
            Parallel.For(0, size, i =>
            {
                for (int j = 0; j < size; j++)
                {
                    double tmp = 0;
                    for (int k = 0; k < size; k++)
                    {
                        tmp += m1[k*size + i] * m2[j*size + k];
                    }
                    result[j*size + i] = tmp;
                }
            });
        }
    }
}

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MathNet.Numerics;
using MathNet.Numerics.LinearAlgebra.Double;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace ProjectEuler
{
    class Program
    {
        public delegate void MatrixMultDelegate();
        public static int size;
        public static double[] m1;
        public static double[] m2;
        public static double[] m3;
        public static DenseMatrix DM1;
        public static DenseMatrix DM2;
        public static DenseMatrix DM3;
        static void Main(string[] args)
        {
            testMatrix();
        }
        static void testMatrix()
        {
            size = 900;
            m1 = new double[size* size];
            m2 = new double[size* size];
            m3 = new double[size* size];
            DM1 = new DenseMatrix(size);
            DM2 = new DenseMatrix(size);
            DM3 = new DenseMatrix(size);
            CreateMatrix(size, m1, m2, DM1, DM2) ;
            for (int i = 0; i <2; i++)
            {
                RunAndMeasureTime(new MatrixMultDelegate(Mult2), "Multiply size " + size + " Matrices with Math.NET: ");
                RunAndMeasureTime(new MatrixMultDelegate(Mult1), "Multiply size " + size + " Matrices with Parallel.For(): ");
            }
            Console.ReadLine();
        }
        private static void CreateMatrix(int size, double[] m1, double[] m2, DenseMatrix dm1, DenseMatrix dm2)
        {
            Random autoRand = new Random(0);
            for (int i = 0; i < size; i++)
            {
                for (int j = 0; j < size; j++)
                {
                    double RandomNumber = autoRand.NextDouble();
                    m1[j * size + i] = RandomNumber;
                    m2[j * size + i] = RandomNumber;
                    dm1[i, j] = RandomNumber;
                    dm2[i, j] = RandomNumber;
                }
            }
        }
        private static void RunAndMeasureTime(MatrixMultDelegate methodToRun, string outputText)
        {
            DateTime startTime = DateTime.Now;
            methodToRun();
            TimeSpan duration = DateTime.Now - startTime;
            Console.WriteLine(outputText + duration.TotalSeconds + " seconds");
        }
        private static void Mult1()
        {
            MultiplyMatricesTempVariableParallelFor(size, m1, m2, m3);
        }
        private static void Mult2()
        {
            DM1.Multiply(DM2, DM3);
        }
        private static void MultiplyMatricesTempVariableParallelFor(int size, double[] m1, double[] m2, double[] result)
        {
            Parallel.For(0, size, i =>
            {
                for (int j = 0; j < size; j++)
                {
                    double tmp = 0;
                    for (int k = 0; k < size; k++)
                    {
                        tmp += m1[k*size + i] * m2[j*size + k];
                    }
                    result[j*size + i] = tmp;
                }
            });
        }
    }
}

 

 

Feb 23, 2010 at 6:40 PM
Edited Feb 23, 2010 at 7:02 PM

Thank you for sharing the code. My runtimes with your version of the code are:

Multiply size 900 Matrices with Math.NET: 11.1256363 seconds
Multiply size 900 Matrices with Parallel.For(): 4.2307927 seconds

I'd love to know why your results are so much faster. In any case I can definitely confirm that the 1d array implementation is a significant boost in performance.

edit: I think I know what's going on. I was running in visual studio's IDE just pressing F5 (start debugging). This time, I built the solution, closed all other applications, and ran on the command line. My results with my 2D matrices:


Multiply size 900 Matrices with Math.NET: 3.396839 seconds
Multiply size 900 Matrices with Parallel.For(): 2.2797648 seconds
Multiply size 900 Matrices with Math.NET: 3.4491973 seconds
Multiply size 900 Matrices with Parallel.For(): 2.27213 seconds

And my results with your 1D matrices:

Multiply size 900 Matrices with Math.NET: 2.3541346 seconds
Multiply size 900 Matrices with Parallel.For(): 1.7801197 seconds
Multiply size 900 Matrices with Math.NET: 2.3187945 seconds
Multiply size 900 Matrices with Parallel.For(): 1.8727064 seconds

Looks I clock in to about the same 1.8 seconds you reported for 1d. I can't explain why the Math.NET implementation goes from 3.3 seconds to 2.3 seconds though, when only the code surrounding the matrices changed! I've re run this test several times and the results are consistent. Math.NET speeds up by almost a full second when the helper matrices are 1d instead of 2d.

Do you find this difference in speed for Math.NET as well, Marcus? If you do, perhaps this is indicative of different behavior for matrix multiplication when the constructor is a 1d array vs a 2d array?

Feb 23, 2010 at 7:15 PM

>Do you find this difference in speed for Math.NET as well, Marcus?
Actually, I'm seeing the opposite - Math.NET is running slower. I don't know why.


>If you do, perhaps this is indicative of different behavior for matrix multiplication when the constructor is a 1d array vs a 2d array?

I don't think so.  Math.NET Numerics' DenseMatrix always uses 1D arrays to store the matrix data. I found one bug in our GEMM code
that might be causing the problem - I'll explore it some more tomorrow.

Regards,
Marcus 

Coordinator
Feb 24, 2010 at 1:00 AM
Edited Feb 24, 2010 at 1:10 AM

Hi Marcus and zass,

I just ran some perf tests on my own new box as well (.Net 4 RC CLR on an Intel Core i7 (QC plus HT) on Win7 x64):

Perf Code from Marcus' post above (1D):
Multiply size 900 Matrices with Math.NET: 1.7761016 seconds
Multiply size 900 Matrices with Parallel.For(): 1.5350878 seconds
Multiply size 900 Matrices with Math.NET: 1.8281046 seconds
Multiply size 900 Matrices with Parallel.For(): 1.5570891 seconds

It seems on my setup Math.NET Numerics and Parallel.For are closer together.

I also ran some similar perf tests but based on the tool I used to test Iridium some time ago. Average of 5 consecutive method calls, measured after 2 warm up rounds, using Stopwatch class. "100" means 100x100 matrices, random values:

Sequential 1D (result as parameter)
single-threaded, only one core is used:

  100: 2 ms
  256: 62 ms
  512: 530 ms
  900: 4999 ms

Parallel.For 1D (result as parameter)
multi-threaded, all cores are used:

  100: 1 ms
  256: 17 ms
  512: 263 ms
  900: 1518 ms

Math.NET Numerics without native providers (result as parameter)
multi-threaded, all cores are used:

  100: 2 ms
  256: 28 ms
  512: 321 ms
  900: 1764 ms

Math.NET Iridium, one of the Math.NET Numerics predecessors (result chained, input of next iteration)
single-threaded, only one core is used:

 100: 1 ms
 256: 20 ms
 512: 168 ms
 900: 917 ms

Wondering whether my perf test code is broken or if Iridium really is faster even if only a single core is used. Note that Iridium is using jagged arrays internally (the compiler can optimize for them nicely, i.e. dropping most of the range checks in the loops).

Thanks,
Chris

Coordinator
Feb 24, 2010 at 8:11 AM

Just to throw some extra numbers in the mix: using Intel MKL (in Matlab & another numerics package) the 900 x 900 matrix multiplication clocks in at 0.1 second or so.

J

Feb 24, 2010 at 8:53 AM

OK, the bug turned out not to be in the GEMM code but the simple multiplication code. It was doing a multiply and update.  If you ran the multiplication twice with the same result matrix, the result matrix would contain the sum of the two multiplications. I fixed it and the code runs a bit faster, but more importantly correctly.

There are still a few timing inconsistencies I'm trying to track down.

 

 

Apr 22, 2010 at 10:12 AM
Edited Apr 22, 2010 at 10:13 AM

I found this library website (Math.NET) because I was thinking of implementing an algorithm in .NET instead of Matlab because of performance issues.

However doing a 900x900 matrix multiplication in Matlab clocks on 0.088962 seconds on my machine. Doing the calculation A^50 clocks on 0.483745 seconds. This is quite a lot better that Math.NET in it's current state, which makes me doubt I will get much of a speed improvement running the algorithm on .NET instead of in Matlab. I might however add I'm not bottlenecked by matrix multiplications but by other operations in Matlab, so I might still try just to see the performance differences.

I might be be missing something, but all the multiplication algorithms seems like brute force O(N^3) methods but parallelized. I remember from taking algorithm courses at my university some time ago that matrix multiplications was quite a good problem to solve using divide and conquer techniques. Is there any parallelization problems using these techniques or why isn't it utilized?

Apr 23, 2010 at 7:29 AM
Edited Apr 23, 2010 at 7:31 AM

If you use the native linear algebra providers, you'll get speeds close to Matlab. We do plan on improving the managed version, but it will never be as fast as the native code (the runtime doesn't support SIMD operations). Right now you'll need to build the native providers yourself. We will release one based on ATLAS in the future. We plan on supporting ATLAS, MKL, and ACML - eventually supporting OpenCL or CUDA.

Apr 25, 2010 at 5:03 PM

Hi,

I'm also interested of this library.

>  If you use the native linear algebra providers, you'll get speeds close to Matlab. We do plan on improving the managed version, but it will never be as fast as the native code (the runtime doesn't support SIMD operations). Right now you'll need to build the native providers yourself. We will release one based on ATLAS in the future. We plan on supporting ATLAS, MKL, and ACML - eventually supporting OpenCL or CUDA.

Can you give any estimates of the penalty occuring calling "native linear algebra providers" from managed code?

Just for reference, on my old laptop (XP sp3, with Intel 1,8 Ghz processor, SSE2) above matrix exponent A^50 takes roughly 115ms with NumPy (1.3.0)!

So how close of this I could expect to reach with "some future" Math.NET?

 

Thanks,

Jens

Apr 25, 2010 at 5:33 PM

Hi Jens,

>Can you give any estimates of the penalty occuring calling "native linear algebra providers" from managed code?

It is negligible. The runtime just "pins" a pointer and passes it to the native library. Add there is some minor one-time initialization.

> roughly 115ms with NumPy (1.3.0)!

Was NumPy built with ATLAS support?

>So how close of this I could expect to reach with "some future" Math.NET?
Probablly not even close. We recommend using the native providers for large matrices.  I would guess that a pure python implementation wouldn't be close either. 

Regards,

Marcus

 

Apr 25, 2010 at 9:43 PM
Hi Marcus,

 
2010/4/25 cuda <notifications@codeplex.com>

From: cuda

Hi Jens,

>Can you give any estimates of the penalty occuring calling "native linear algebra providers" from managed code?

It is negligible. The runtime just "pins" a pointer and passes it to the native library. Add there is some minor one-time initialization.

That's good! Somehow I allways tought this kind of marshalling would be a big barrier.

> roughly 115ms with NumPy (1.3.0)!

Was NumPy built with ATLAS support?

Not really, kind of my own setup. But that's not really the main concern.

>So how close of this I could expect to reach with "some future" Math.NET?
Probablly not even close. We recommend using the native providers for large matrices.  I would guess that a pure python implementation wouldn't be close either.

I'm not a big fan of "pure python". Overall I'm more interested of the F# approach. Also, even I'm able to digg in the details, I'll like to operate much more on "higher" level. So my overall concern lies on the questions like: can I express my self (i.e. my code) based on linear algebra, can I efficiently do singular value decompositions (like only based on k largest ones of A(m, n), i.e. U, S, V= svd(A, k) => U(m, k), S(k, k), V(k, n), am I able to efficiently call some mature external (native) libraries, without huge penalty barriers, ....

Regards,

Marcus

I think this kind of functionality you guys are trying to implement on .NET is most wellcomed. I hope Microsoft could leverage more of their effort to this kind of questions as well.
 
I hope I'll have soon time to investigate more of your library, inorder to provide more detailed comments and questions.
 
Thanks,
- Jens
Apr 26, 2010 at 3:42 AM

Hi Jens,

I'm not sure what the F# interface will look like - Jurgen is working on that. As for the external libraries, P/Invoke is very efficient at passing data between managed and native code when the data being passed is an array of blittable types* - which is our case.  We welcome any comments on how to improve the library.

Thanks,
Marcus 

 

 

 

**http://en.wikipedia.org/wiki/Blittable_types

Coordinator
Apr 27, 2010 at 8:03 AM

Definitely let us know what you'd like the F# interface to look like. Currently, the F# interface only has methods to construct and manipulate matrices (Matrix.fromList, Matrix.map, ...) but no functionality for matrix decompositions etc. Please write any suggestions to the issue tracker.

Cheers Jurgen

Jun 1, 2010 at 5:59 PM

Hi all!

It is so great to have a open source math/linear algebra project in .Net.

I think a good way to improve performance is to provide bindings to ATLAS and LAPACK. I'd like to know about some of your decisions why new managed C# code is priority one in Math.Net, but not the binding to the stable ATLAS+LAPACK. One concern is that managed code is more cross platform. However now, people just want speed and stability.

I am using F#. It is such a pity that F# team deprecated their math-provider, which is a native binding to Netlib-Lapack and MKL.

Let's exam Python/Numpy, R, Matlab. All of them provide efficient binding to a efficient BLAS and Lapack. IMHO, binding is a great way to go. And BLAS  and LAPACK have very friendly license, no GPL virus. 

Best,

Yin

 

Jun 1, 2010 at 6:03 PM

Hi Yin,

Already working on it. We'll support ATLAS, MKL, and ACML. Take a look at the native provider project.

Regard,

Marcus