This project has moved and is read-only. For the latest updates, please go here.

Cut cols and rows (or factor the matrix)

Apr 23, 2013 at 5:17 PM
Hi all,

I am relevantly new in c# and not a profi (however, I hope to become a such once..), so please do not punish for stupid questions now.

I made a singular value decomposition of sparse matrix A(mxn) as A=UsV.
"Suppose M is an m-by-n matrix whose entries are real numbers.
 Then there exists a factorization of the form M = UΣVT where:
 - U is an m-by-m unitary matrix;
 - Σ is m-by-n diagonal matrix with nonnegative real numbers on the diagonal;
 - VT denotes transpose of V, an n-by-n unitary matrix; "
Question 1)
K - is a number of latent features. Let's say 30.
How can I make SVD to calculate give results as only mxK in U, KxK(upper left) in Σ and nxK in VT?

I other words I need to cut U and VT so that we do not use memory for keeping resulting mxm and nxn matrices (they can be very huge, 500Kx500K for example) and get upper left KxK in Σ.

Question 2)
If first is not possible (seems so from the definitions at least).
How can I cut cols and rows in the matrix U, s and V using Math.NET?

For example, If i have 100x100 matrix, i need to get only 100x30 of it.

Thank you all for the help!

Best regards, Alex.
Apr 23, 2013 at 5:18 PM
Basically I do this:
https://inst.eecs.berkeley.edu/~ee127a/book/login/exa_low_rank_4by5.html

A rank-K approximation is given by zeroing out the smallest singular value.
Apr 23, 2013 at 6:07 PM
Ok, I found that answer for question 2 is very simple:
var svd = matrix.Svd(true);
       var matrix_UK = svd.U().SubMatrix(0, svd.W().RowCount, 0, K);
        var matrix_SKK = svd.W().SubMatrix(0, K, 0, K);
        var matrix_VTK = svd.VT().SubMatrix(0, K, 0, svd.VT().ColumnCount);
However, what about first one?
Can I get resulting matrices already in the format I need?

Thanks:)
Apr 24, 2013 at 6:58 AM
Hi Alex,

I think your solution (#2) is the only one possible given the current SVD implementation. It would be possible to extend the SVD code to handle low rank approximation (it might be better to make it a separate class and reuse most of the current code). If you think it would be useful, please submit a feature request on GitHub.

Regards,
Marcus
Apr 24, 2013 at 8:40 AM
Edited Apr 24, 2013 at 8:41 AM
Hi cuda,

Thank you for a quick answer!
It can be useful for recommender system algorithms.
Such algorithms deal with initialization of input matices (which are products of SVD) having huge data array (such as 500 000 x 500 000) of sparse data.
The resulting SVD matrices are 3*500Kx500K (Dense). As a result we cannot fit all these data volume in the memory and therefore we cannot use SVD from Math.Net.

Can you please say if I miss something in my undestanding of memory usage for resulting SVD matrices?

If I am correct - I'll write a feature request with explanations why it can be useful.

Have a nice day!

Best regards, Alex.
Apr 24, 2013 at 10:52 AM
The way SVD is currently set up, during decomposition you will have your original matrix, a copy of the original matrix, a mxm matrix, a nxn matrix, and a min(n, m) vector in memory at one time. So in your case you'll have 4 500Kx500K matrices and on 500K vector in memory during decomposition. After decomposition is finished, the copy will be GC'd leaving just 3 matrices (one of which is the original) and a vector. However, each time you access U(), VT(), or W() and new 500Kx500K matrix is created (we return clones stored in the SVD object).

That really doesn't matter much since a 500Kx500K dense matrix would take up about 2TB of memory and way above the 2GB maximum object size. We'd have introduce a different storage scheme to handle dense matrices of that size (assuming the machine had 8TB of memory).
Apr 24, 2013 at 11:49 AM
cuda wrote:
That really doesn't matter much since a 500Kx500K dense matrix would take up about 2TB of memory and way above the 2GB maximum object size. We'd have introduce a different storage scheme to handle dense matrices of that size (assuming the machine had 8TB of memory).
Well, if original data 500Kx500K and it is 99% sparse the memory consumption with proper data structure will be only 20Gb (1% of 2Tb).
And this is somewhat handable even if resulting matrices are dense and approximated by K already. (with small Ks of course).

However, I do not know if this is possible at all to make SVD with approximation "built in" without a need to keep 4 500Kx500K in the memory during actual decomposition. I mean that maybe decomposition math requires to have all matrix to be in memory to calculate results.

Also good to mention that 500Kx500K is extreme case. In my case I need to handle 500 000 x 20 000 records. However, in "dense form" it consumes as much as ~55Gb of RAM that is unacceptable. And again, that's why I am looking for a way to get less memory consumption during SVD by cutting "un-needed" columns and rows of the resulting matrices.

Best regards, Alex.
Apr 24, 2013 at 12:04 PM
AlexKMN wrote:
However, I do not know if this is possible at all to make SVD with approximation "built in" without a need to keep 4 500Kx500K in the memory during actual decomposition. I mean that maybe decomposition math requires to have all matrix to be in memory to calculate results.
The first thing we need to do is implement a sparse SVD routine. That would drastically reduce the amount memory needed.
Apr 24, 2013 at 12:06 PM
Ok. I go to proposals on GitHub now:)
Thank you for you time!