Performance - arrays vs library

Aug 7 at 10:36 PM
To learn F# and Math.Net I've written 2 versions of the Jensen Shannon distance:
let m1 = DenseMatrix.init 2000 20 (fun i j -> float (i+j)) //values not normalized
let js_distance1 (mat : Matrix<float>) = 
    let kld a b = Vector.map2 (fun p q -> if p = 0.0 || q = 0.0 then 0.0 else log(p / q) * p) a b |> Vector.sum 
    let rowsA = mat.RowCount
    let res = DenseMatrix.create rowsA rowsA 0.0
    for i = 0 to rowsA - 2 do
        for j = i + 1 to rowsA - 1 do
            let av = (mat.[i,*] + mat.[j,*]) / 2.0
            res.[j, i] <- sqrt(0.5 * (kld mat.[i,*] av + kld mat.[j,*] av))
    res
let m2 = Array2D.init 2000 20 (fun i j -> float (i+j)) //values not normalized
let js_distance2 mat = 
    let kld a b = Array.fold2 (fun acc p q -> if p = 0.0 || q = 0.0 then acc else acc + log(p / q) * p) a b 
    let rowsA = Array2D.length1 mat
    let res = Array2D.create rowsA rowsA 0.0
    for i = 0 to rowsA - 2 do
        for j = i + 1 to rowsA - 1 do
            let av = Array.map2 (fun x y -> (x + y) / 2.0) mat.[i,*] mat.[j,*]
            res.[j, i] <- sqrt(0.5 * (kld 0.0 mat.[i,*] av + kld 0.0 mat.[j,*] av))
    res
The array version is approx. 1.8 times faster on my laptop.

Is this kind of performance penalty to be expected when working with a numerical library like Math.NET?

Am I missing any obvious speed-up options? I read in some discussions that the access to matrix or vector values can be a bit slow, but the alternate methods I tried like mat.Rows(i) didn't make any difference.

Thanks!
Coordinator
Aug 8 at 3:58 PM
Edited Aug 8 at 4:02 PM
In general Math.NET cannot really be much faster than 1D-arrays in such algorithms, but it may be faster than 2D-arrays as used here (which are less optimized in .Net). I ran a few benchmarks (without profiling), some of my findings:

A) Both variants can be made significantly faster (the same way) by reusing the extracted row vectors mat.[i,*] in the other loop and mat.[j,*] in the inner loop.

B) I've added locally (not in mainline yet) a Fold2 method to the vectors, bringing us very close to the array version (~4% slower on my machine). However, the way our F# extensions convert the F# functions to .Net Func delegates seems not to optimize away properly, leaving us with some overhead (~8% slower to use Vector.fold2 instead of Vector's new Fold2 method directly). Hopefully this can be improved somehow, by converting or rewriting instead of wrapping the lambda.

C) Very small improvement by writing let av = mati.Map2((fun x y -> (x + y) / 2.0), matj) instead of let av = (mati + matj) / 2.0. Using the method instead of the F# extension for the same reason as in B).

D) Very small improvement by transposing the matrix in the beginning and extracting column vectors instead of row vectors (dense matrices are column-major).

E) The definition of kld is a bit confusing in js_distance2 as a is actually the state, b is the first vector and it returns a function from the second vector to the scalar result. I assume it should actually be let kld a b = Array.fold2 (fun acc p q -> if p = 0.0 || q = 0.0 then acc else acc + log(p / q) * p) 0.0 a b, and the first argument 0.0 dropped when it is called in the second last name?
Aug 9 at 9:57 PM
Thank you very much for your analysis! Was really helpful for my understanding of F# and Math.Net.

I can confirm that A) speeds up the calculation (by about 15% on my laptop), and if a Fold2 method for vectors in the next release will almost close the gap between Math.Net and a pure array variant, then I can't wait to try it out :-)

You are right on E).