This project has moved and is read-only. For the latest updates, please go here.

MKL is even slower

Apr 8, 2014 at 2:35 PM
Edited Apr 8, 2014 at 2:37 PM
I tested MLK with the following simple code
Control.LinearAlgebraProvider = new MklLinearAlgebraProvider();
ContinuousUniform uniform = new ContinuousUniform(-0.5, 0.5);
DenseVector a = DenseVector.CreateRandom(10000, uniform);
DenseVector b = DenseVector.CreateRandom(10000, uniform);

Stopwatch watch = new Stopwatch();
watch.Start();
var c = a * b;
watch.Stop();
Console.WriteLine(watch.Elapsed);
I ran the code above first, and the result on the console is 00:00:01.0080696. Then I commented the provider setting line (the first line above) out, and the result on the console is 00:00:00.0021271.

Both runs are in Release mode.

Why is my code slower with MKL set?
Apr 8, 2014 at 3:20 PM
Edited Apr 8, 2014 at 3:32 PM
MKL may be slower on the first call (especially since this is not an expensive operation itself) but is faster afterwards, at least if the dataset is large enough.

A quick benchmark on my machine (which is not useful for anything beyond demonstrating this):

Repeating 100 times avg, 4 rounds, len=10'000:
Managed: 0.0368ms, 0.0169ms, 0.0168ms, 0.0168ms
MKL: 0.0515ms, 0.0071ms, 0.0080ms, 0.0078ms

Repeating 100 times avg, 4 rounds, len=1'000'000:
Managed: 2.3922ms, 2.0536ms, 2.2833ms, 2.4194ms
MKL: 0.1852ms, 0.1513ms, 0.1555ms, 0.1680ms

Thanks,
Christoph
Apr 9, 2014 at 4:19 AM
Edited Apr 9, 2014 at 4:22 AM
I've thought about this too. So I changed my code to
Control.LinearAlgebraProvider = new MklLinearAlgebraProvider();
ContinuousUniform uniform = new ContinuousUniform(-0.5, 0.5);
DenseVector a = DenseVector.CreateRandom(100, uniform);
DenseVector b = DenseVector.CreateRandom(100, uniform);

Stopwatch watch = new Stopwatch();
watch.Start();

for (int i = 0; i < 100000; i++) {
    var c = a * b;
}

watch.Stop();
Console.WriteLine(watch.Elapsed);
Still, MKL is much slower, according to the console output.
Is there anything else that needs to be done rather than Control.LinearAlgebraProvider = new MklLinearAlgebraProvider()? E.g., configuring my CPU with some Intel tools?
Apr 9, 2014 at 9:33 AM
Edited Apr 9, 2014 at 9:36 AM
Note that you're benchmarking a routine that takes roughly 100ns (1s = 1'000'000'000ns). For such short operations at some point the p/invoke and marshaling overhead to call from managed code into native code will dominate the timing entirely. If you only need short operations like this (as opposed to, say, large matrix multiplications or decompositions) then you may indeed end up with better performance with the managed provider.

Also, your code still includes the first call (which includes JIT, MKL init etc) which will dominate the average for both providers. A small modification testing the same thing but measuring the first call separately (as Init) typically shows results like this on my machine for MKL:
Intel MKL (x64; revision 4)
Create Data: 00:00:00.0127491s
Init: 2363400.000ns
A: 101.825ns
A: 102.201ns
A: 102.026ns
A: 101.821ns
A: 102.509ns
And for managed:
Managed
Create Data: 00:00:00.0128457s
Init: 1948500.000ns
A: 168.284ns
A: 167.912ns
A: 168.293ns
A: 168.173ns
A: 168.143ns
Code:
const int N = 100000;

Control.UseManaged();
//Control.UseNativeMKL();
Console.WriteLine(Control.LinearAlgebraProvider.ToString());

var w = Stopwatch.StartNew();
var uniform = new ContinuousUniform(-0.5, 0.5)
var a = Vector<double>.Build.Random(100, uniform);
var b = Vector<double>.Build.Random(100, uniform);
Console.WriteLine("Create Data: {0}s", w.Elapsed);

// we accumulate the results to make sure the compiler does not optimize it away.
w.Restart();
double x = a*b;
Console.WriteLine("Init: {0:0.000}ns", (w.Elapsed.TotalMilliseconds*1000*1000));

for (int k = 0; k < 5; k++)
{
    w.Restart();
    for (int i = 0; i < N; i++)
    {
        x += a*b;
    }
    Console.WriteLine("A: {0:0.000}ns", (w.Elapsed.TotalMilliseconds*1000*1000)/N);
}
Console.WriteLine(x);
Does this clarify things? Or do you see very different numbers in your setup?

Thanks,
Christoph
Marked as answer by cdrnet on 4/16/2014 at 1:46 AM