Profiling code that calls Matrix multipication

Nov 12, 2013 at 1:04 AM
I've been trying to profile my app with JetBrains dottrace performance. The problem is when I profile it with thread cycle time, CacheObliviousMatrixMultiply takes forever. Simple log4net tracing proves that matrix multiplication is not the bottleneck for me.

I am using 2.6.2. My largest multiplication is a 55x50 matrix against a 50x10000 matrix. My matrices are Double.DenseMatrix and Matrix<double>.

How would you recommend profiling an app that uses matrix multiply?
Nov 12, 2013 at 6:46 PM
What could work without much changes is to use the MKL native provider. The multiplication is then be performed in native code, which the profiler probably treats as a black box call. Using a native provider significantly changes the performance of the multiplication, but that may be ok if you've already shown it is not the bottleneck.
Marked as answer by zippy1981 on 11/12/2013 at 3:03 PM
Nov 12, 2013 at 11:03 PM
That speeds up my code a lot, and allowed me to find other bottlenecks with my profiler. I've gone from ~25 seconds a call to 3-6 seconds a call.