This project has moved and is read-only. For the latest updates, please go here.

Can this be done faster?

May 29, 2014 at 7:25 PM
The majority of my computation time is spent on this:
            DeltaEncoder.Clear()
            For col = 0 To NumberOfObs - 1
                DeltaEncoder += NeuronError.Column(col).OuterProduct(InputWithBias.Column(col))
            Next col
            DeltaEncoder /= NumberOfObs
The NumberOfObs is 2000.
NeuronError is a 100x2000 densematrix.
InputWithBias is a 101x2000 densematrix.

I basically need to find the sum(or average) of all the observation vectors in NeuronError outer-multiplied with their corresponding observation vector in InputWithBias.

This is part of a backpropogation algorithm.

Perhaps it would be faster to first transpose each matrix and then do rowvector outerproducts?

It seems like this operation would have been named/optimized since it is involved in backprop.
Any suggestions?
May 30, 2014 at 8:14 PM
Edited May 30, 2014 at 8:18 PM
Yes, this could be a bit faster. I've measured the quoted part (average on 100 repetitions) on my system with random data of the same size. Compiled in release mode, without debugger attached (hence outside VisualStudio).
  • As quoted: 0.367s
  • When I compile explicitly for the x64 platform target: 0.217s
  • With in-place addition: 0.202s
DeltaEncoder.Add(NeuronError.Column(col).OuterProduct(InputWithBias.Column(col)), DeltaEncoder)
  • With Intel MKL/x64 native provider: 0.135s
' Install MathNet.Numerics.MKL.Win-x64 NuGet package (beware: may required license from Intel)
Control.UseNativeMKL
Then, it seems the OuterProduct operation was missed when all the vector-vector operations have been refactored in the early v3 works. Because of this it does not come with a proper inplace version. I've fixed this in master, will be part of the next release greater than v3.0.0-beta02. With this change you can change the loop as follows:
DeltaEncoder.Clear();
var work = Matrix<double>.Build.Dense(DeltaEncoder.RowCount, DeltaEncoder.ColumnCount);
for (int col = 0; col < NumberOfObs; col++)
{
    NeuronError.Column(col).OuterProduct(InputWithBias.Column(col), work);
    DeltaEncoder.Add(work, DeltaEncoder);
}
DeltaEncoder /= NumberOfObs;
  • Managed: 0.124s
  • With MKL: 0.059s
Still not very fast but quite a bit better. Do you get similar numbers?

Thanks,
Christoph
May 30, 2014 at 10:11 PM

Thanks Chrisoph. I'll run some comparisons.
Is it faster to select row vectors or column vectors from a dense matrix?
As of now I have observation vectors arranged as col vectors, but this could be arranged as row vectors.
From the example of others in vectorized backprop col vectors seems to be the convention, but I could easily reverse this.

May 30, 2014 at 10:34 PM
Edited May 30, 2014 at 10:34 PM
Our dense matrices are stored in column-major order, so working by column is faster (i.e. extracting a column vector is a single block copy).
May 31, 2014 at 7:35 PM
Still working on setting up your suggestions.
I have to download VisualStudio 2013 so that I can create 64bit apps.
May 31, 2014 at 8:38 PM
I don't think you need VS 2013 to compile for the x64 target platform, this is just a compiler flag or a setting in the project file (AnyCPU, x86 or x64).

What tool/platform are you using currently?
May 31, 2014 at 11:12 PM

I was using visual studio 2012.
At this point I almost have 2013 downloaded, oh well... might as well get the latest version while it is free ( while I'm still in college).

Jun 1, 2014 at 11:28 PM
Edited Jun 1, 2014 at 11:34 PM
Ok here are my results:
    timer.Start()
    For x = 0 To 99
        DeltaEncoder.Clear()
        For col = 0 To NumberOfObs - 1
            DeltaEncoder + = NeuronError.Column(col).OuterProduct(InputWithBias.Column(col))
        Next col
        DeltaEncoder /= NumberOfObs
    Next x
    timer.Stop()
101809 milliseconds (Debug mode)
63878 milliseconds (Release mode)
    timer.Start()
    For x = 0 To 99
        DeltaEncoder.Clear()
        For col = 0 To NumberOfObs - 1
            DeltaEncoder.Add(NeuronError.Column(col).OuterProduct(InputWithBias.Column(col)), DeltaEncoder)
        Next col
        DeltaEncoder /= NumberOfObs
    Next x
    timer.Stop()
78009 milliseconds (Debug mode)
37766 milliseconds (Release mode)

Thanks looks like I can about double my speed in release mode!

I didn't realize that the (debug vs release) times would be so significantly different.
This is the first time I used release mode.