
Hi all,
I am relevantly new in c# and not a profi (however, I hope to become a such once..), so please do not punish for stupid questions now.
I made a singular value decomposition of sparse matrix A(mxn) as A=UsV.
"Suppose M is an mbyn matrix whose entries are real numbers.
Then there exists a factorization of the form M = UΣVT where:
 U is an mbym unitary matrix;
 Σ is mbyn diagonal matrix with nonnegative real numbers on the diagonal;
 VT denotes transpose of V, an nbyn unitary matrix; "
Question 1)
K  is a number of latent features. Let's say 30.
How can I make SVD to calculate give results as only mxK in U, KxK(upper left) in Σ and nxK in VT?
I other words I need to cut U and VT so that we do not use memory for keeping resulting mxm and nxn matrices (they can be very huge, 500Kx500K for example) and get upper left KxK in Σ.
Question 2)
If first is not possible (seems so from the definitions at least).
How can I cut cols and rows in the matrix U, s and V using Math.NET?
For example, If i have 100x100 matrix, i need to get only 100x30 of it.
Thank you all for the help!
Best regards, Alex.






Ok, I found that answer for question 2 is very simple:
var svd = matrix.Svd(true);
var matrix_UK = svd.U().SubMatrix(0, svd.W().RowCount, 0, K);
var matrix_SKK = svd.W().SubMatrix(0, K, 0, K);
var matrix_VTK = svd.VT().SubMatrix(0, K, 0, svd.VT().ColumnCount);
However, what about first one?
Can I get resulting matrices already in the format I need?
Thanks:)



Hi Alex,
I think your solution (#2) is the only one possible given the current SVD implementation. It would be possible to extend the SVD code to handle low rank approximation (it might be better to make it a separate class and reuse most of the current code). If you
think it would be useful, please submit a feature request on GitHub.
Regards,
Marcus


Apr 24, 2013 at 8:40 AM
Edited Apr 24, 2013 at 8:41 AM

Hi cuda,
Thank you for a quick answer!
It can be useful for recommender system algorithms.
Such algorithms deal with initialization of input matices (which are products of SVD) having huge data array (such as 500 000 x 500 000) of sparse data.
The resulting SVD matrices are 3*500Kx500K (Dense). As a result we cannot fit all these data volume in the memory and therefore we cannot use SVD from Math.Net.
Can you please say if I miss something in my undestanding of memory usage for resulting SVD matrices?
If I am correct  I'll write a feature request with explanations why it can be useful.
Have a nice day!
Best regards, Alex.



The way SVD is currently set up, during decomposition you will have your original matrix, a copy of the original matrix, a mxm matrix, a nxn matrix, and a min(n, m) vector in memory at one time. So in your case you'll have 4 500Kx500K matrices and on 500K
vector in memory during decomposition. After decomposition is finished, the copy will be GC'd leaving just 3 matrices (one of which is the original) and a vector. However, each time you access U(), VT(), or W() and new 500Kx500K matrix is created (we return
clones stored in the SVD object).
That really doesn't matter much since a 500Kx500K dense matrix would take up about 2TB of memory and way above the 2GB maximum object size. We'd have introduce a different storage scheme to handle dense matrices of that size (assuming the machine had 8TB of
memory).



cuda wrote:
That really doesn't matter much since a 500Kx500K dense matrix would take up about 2TB of memory and way above the 2GB maximum object size. We'd have introduce a different storage scheme to handle dense matrices of that size (assuming the machine
had 8TB of memory).
Well, if original data 500Kx500K and it is 99% sparse the memory consumption with proper data structure will be only 20Gb (1% of 2Tb).
And this is somewhat handable even if resulting matrices are dense and approximated by K already. (with small Ks of course).
However, I do not know if this is possible at all to make SVD with approximation "built in" without a need to keep 4 500Kx500K in the memory during actual decomposition. I mean that maybe decomposition math requires to have all matrix to be in memory
to calculate results.
Also good to mention that 500Kx500K is extreme case. In my case I need to handle 500 000 x 20 000 records. However, in "dense form" it consumes as much as ~55Gb of RAM that is unacceptable. And again, that's why I am looking for a way to get less
memory consumption during SVD by cutting "unneeded" columns and rows of the resulting matrices.
Best regards, Alex.



AlexKMN wrote:
However, I do not know if this is possible at all to make SVD with approximation "built in" without a need to keep 4 500Kx500K in the memory during actual decomposition. I mean that maybe decomposition math requires to have all matrix
to be in memory to calculate results.
The first thing we need to do is implement a sparse SVD routine. That would drastically reduce the amount memory needed.



Ok. I go to proposals on GitHub now:)
Thank you for you time!

