This project has moved. For the latest updates, please go here.

How to load data into matrix

Nov 27, 2012 at 6:57 AM
Edited Nov 27, 2012 at 7:00 AM

New to matrices ... coming from relational database background ...

I've got a database table that has: rowkey, columnkey, cellvalue which is read in from SQL to a reader

and the keys are in a hashtable which allows lookup based on key to get row and col IDs. (same set of keys can be in either row or column)

Problem is that it's taking about 4 hours to load ~2M datapoints into a 10500x10500 SparseMatrix A in memory for calculation by looping over the following C# stmt:

                 A [ (int)idHash [reader[0].ToString()], (int)idHash [reader[1].ToString()] ] = reader.GetDouble(2);

I was looking for a "bulk load" function and see SetColumn and SetRow but just not sure, since it's sparse data, whether it's worth it to prepare by columns or ? ... 

Any ideas on how to speed this process up?

Thanks!

Coordinator
Nov 27, 2012 at 12:40 PM
Edited Nov 27, 2012 at 12:44 PM

Hi,

Uh. Sparse matrices are much slower than dense ones by design, but 4h is unacceptable. Some ideas:

A) If possible, always initialize a sparse matrices row by row, in ascending order.

Quick Example (in F#, but would be equivalent in C#), initializing a matrix (110M fields) with 2M values:

let good = new SparseMatrix(10500,10500)
for row in 0..999 do
    for col in 0..999 do
        good.At(10*row, 10*col, float32 (Math.Atan2(float row, float col)))
        good.At(2+10*row, 3+10*col, float32 (row+col))

This needs roughly 1 minute on my machine. Still slow, but much better than your 4h.

let bad = new DenseMatrix(10500,10500)
for col in 0..999 do
    for row in 0..999 do
        bad.At(10*row, 10*col, float32 (Math.Atan2(float row, float col)))
        bad.At(2+10*row, 3+10*col, float32 (row+col))

This second version is not row by row, and needs almost 20 times the time of the first example (lot of copying involved).

Hence, if you can, sort your fields by rows and then cols, ascending, before insertion. Thinking about it, we could provide some sort of sparse matrix initialization helper to support such scenarios. They could avoid almost if not all copying if the list of non-zero cells is available in memory at the beginning, speeding it up significantly.

B) Minor: Use At(r,c,v) instead of the [r,c] indexer if you don't need range checking (slightly faster)

Although in this case it probably doesn't make much of a difference

C) Directly populate the underlying storage

Create an instance of the SparseCompressedRowMatrixStorage class and populate the RowPointer/ColumnIndex/Values arrays directly. Then create the matrix using new SparseMatrix(storage). This is clearly the most performant approach (as long as we do not provide an initializer as mentioned above), but requires knowledge of the chosen storage layout.

D) Consider dense matrices

If I change the above example to use a dense matrix instead (after all such a 100M cell matrix still fits into memory easily):

let dense = new DenseMatrix(10500,10500)
for row in 0..999 do
    for col in 0..999 do
        dense.At(10*row, 10*col, float32 (Math.Atan2(float row, float col)))
        dense.At(2+10*row, 3+10*col, float32 (row+col))

then the example runs in less than a second. However, any operation on that matrix would obviously then be very inefficient, but it depends what you actually want to do with it.

 

Thinking about it, we should really consider to provide some efficient sparse matrix initializer (as mentioned in A), that avoids any expensive copying at the cost of a two-pass process. In your case that may work nicely, since it seems you already know all possible non-zero value indices at the beginning (in your hash table). Let me know what you think.

Thanks,
Christoph 

Coordinator
Nov 27, 2012 at 1:02 PM
Edited Nov 27, 2012 at 1:03 PM

Quick note: we currently have a pull request open for an alternative sparse storage scheme (based on Knuth) that might be much more efficient in this scenario, since it seems to avoid almost all copying in this case. Might be worth to give it a trial.

Thanks,
Christoph 

Nov 27, 2012 at 3:20 PM
Thanks a bunch, Christoph.

We'll look at this and let you know what we come up with.

Yvonne


On Tue, Nov 27, 2012 at 6:02 AM, cdrnet <notifications@codeplex.com> wrote:

From: cdrnet

Quick note: we currently have a [url:pull request|https://github.com/mathnet/mathnet-numerics/pull/58] open for an alternative sparse storage scheme (based on Knuth) that might be much more efficient in this scenario, since it seems to avoid almost all copying in this case. Might be worth to give it a trial.

Thanks,
Christoph

Read the full discussion online.

To add a post to this discussion, reply to this email (mathnetnumerics@discussions.codeplex.com)

To start a new discussion for this project, email mathnetnumerics@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com




--
--
Yvonne Burgess
Chief Systems Architect
Climate Earth, Inc.
415.391.2725 (office)
415.816.2674 (mobile)
www.climateearth.com

Nov 27, 2012 at 3:38 PM
Christoph,

re: your comment:

"D) Consider dense matrices

If I change the above example to use a dense matrix instead (after all such a 100M cell matrix still fits into memory easily):

let dense = new DenseMatrix(10500,10500)
for row in 0..999 do
    for col in 0..999 do
        dense.At(10*row, 10*col, float32 (Math.Atan2(float row, float col)))
        dense.At(2+10*row, 3+10*col, float32 (row+col))

then the example runs in less than a second. However, any operation on that matrix would obviously then be very inefficient, but it depends what you actually want to do with it."


Our next operation on this bad boy is:

D = (SparseMatrix)(I - A).Inverse();

where I is the identity matrix also (10500, 10500). If you have suggestions on improving this code, we welcome your comment.


When we went from working with 800x800 matrices to the dim 10500, we hit out of memory problems just initializing the I matrix. So we pulled in the latest Math.Net libraries and switched most of the matrices to Sparse except for those over 50% non-zero ... but maybe we should try setting them back to Matrix or as DenseMatrix.

When you say "(after all such a 100M cell matrix still fits into memory easily)" - do you have guideline for selecting use of sparse vs. dense? I (simpleton) just computed number of non-zero datapoints as a percentage to determine whether the thing was sparse or not. Do you have a smarter guideline we should be using?

Thanks for all your help!

Yvonne




On Tue, Nov 27, 2012 at 8:20 AM, Yvonne Burgess <yvonne@climateearth.com> wrote:
Thanks a bunch, Christoph.

We'll look at this and let you know what we come up with.

Yvonne


On Tue, Nov 27, 2012 at 6:02 AM, cdrnet <notifications@codeplex.com> wrote:

From: cdrnet

Quick note: we currently have a [url:pull request|https://github.com/mathnet/mathnet-numerics/pull/58] open for an alternative sparse storage scheme (based on Knuth) that might be much more efficient in this scenario, since it seems to avoid almost all copying in this case. Might be worth to give it a trial.

Thanks,
Christoph

Read the full discussion online.

To add a post to this discussion, reply to this email (mathnetnumerics@discussions.codeplex.com)

To start a new discussion for this project, email mathnetnumerics@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com




--
--
Yvonne Burgess
Chief Systems Architect
Climate Earth, Inc.
415.391.2725 (office)
415.816.2674 (mobile)
www.climateearth.com




--
--
Yvonne Burgess
Chief Systems Architect
Climate Earth, Inc.
415.391.2725 (office)
415.816.2674 (mobile)
www.climateearth.com

Dec 1, 2012 at 2:08 PM
Edited Dec 1, 2012 at 3:53 PM

Regarding sparse matrix loading performance: I created an adapter for that a while ago and have reworked it now so it should be usable for others. Get it from here.

I used matrix data from http://math.nist.gov/MatrixMarket/data/SPARSKIT/fidap/fidap.html for testing. A few results:

Matrix 'fidap008.mtx' loaded in 172ms
Size: 3096x3096, non-zeros: 106302
Benchmark started ...
Using SparseMatrix: 1173ms
Using StorageAdapter: 19ms
Matrices equal: True

Matrix 'fidap019.mtx' loaded in 427ms
Size: 12005x12005, non-zeros: 259863
Benchmark started ...
Using SparseMatrix: 11242ms
Using StorageAdapter: 42ms
Matrices equal: True

@Christoph: Looking at your sparse storage implementation, I found that

  1. if you got a m-by-n matrix, then usually storage.RowPointers is size m+1, where storage.RowPointers[m] = number of actual entries (non-zeros). You are using a size m array and storing the number of non-zeros explicitly, which makes some of the algorithms I use more complicated;
  2. it would be cool to have a public generic constructor like SparseCompressedRowMatrixStorage<T>(int m, int n).

Regards,

Chris

EDIT: Though I haven't found any problems with the code so far, keep in mind that it's not tested thoroughly!!!

Dec 1, 2012 at 3:50 PM
Edited Dec 1, 2012 at 3:55 PM

@Yvonne

You really don't want to do Inverse() on a sparse matrix, since the result will most likely not be sparse. Use iterative solvers instead.

If you are looking for sparse direct solvers: a few days ago I published a .NET version of CSparse here on CodePlex: CSparse.NET

Dec 1, 2012 at 9:41 PM
Thanks much for your comments Chris. Yeah, we guessed the problem is likely, as you suggest, in trying to get the Inverse - that sucker was running over 24 hrs and never finished so we pretty much gave up ...

When you advise to "use iterative solvers instead" can you help me understand what those are / do / how to "use" -- that is "invoke" -- for my particular application/calc I'm trying to accomplish?

Unless I missed it completely, the Math.net documentation is really sketchy; i guess there's an assumption that anyone who would wander in knows these basics.

And I have to admit I'm a bit out of my element here tho was a pgmr in past life (way past) and a tiny math major at one time (even more way past). So have sent your comments on to my science guy (who knows the matrix math) and my sw eng (who knows C#) ... both better than I.

Y



On Sat, Dec 1, 2012 at 8:50 AM, wo80 <notifications@codeplex.com> wrote:

From: wo80

@Yvonne

You really don't want to do Inverse() on a sparse matrix, since the result will most likely not be sparse. Use iterative solvers instead.

If you are looking for sparse direct solvers: a few days ago I published a .NET version of CSparse here on CodeProject: CSparse.NET

Read the full discussion online.

To add a post to this discussion, reply to this email (mathnetnumerics@discussions.codeplex.com)

To start a new discussion for this project, email mathnetnumerics@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com




--
--
Yvonne Burgess
Chief Systems Architect
Climate Earth, Inc.
415.391.2725 (office)
415.816.2674 (mobile)
www.climateearth.com

Dec 2, 2012 at 9:15 AM
Edited Dec 2, 2012 at 9:20 AM

When solving a linear system Ax = b you can either choose a direct solver (like LU factorization) which will compute the exact solution, or use an iterative solver, which will compute an approximate solution. Read more about the possible choices at http://scicomp.stackexchange.com/a/378

Here's some code which should explain how to use iterative solvers with Math.NET:

using System.Collections.Generic;
using MathNet.Numerics.LinearAlgebra.Double;
using MathNet.Numerics.LinearAlgebra.Double.Solvers;
using MathNet.Numerics.LinearAlgebra.Double.Solvers.Iterative;
using MathNet.Numerics.LinearAlgebra.Double.Solvers.StopCriterium;
using MathNet.Numerics.LinearAlgebra.Generic.Solvers.StopCriterium;

/// <summary>
/// Example code for using iterative solvers.
/// </summary>
public class Example
{
    public static Vector Solve()
    {
        // Parallel is actually slower, so disable!
        MathNet.Numerics.Control.DisableParallelization = true;

        // Choose your solver: BiCgStab, TFQMR, GpBiCg or MlkBiCgStab.

        // BiCgStab is usually a good choice
        BiCgStab solver = new BiCgStab();

        // Choose stop criterias
        var stopCriterias = new List<IIterationStopCriterium>()
        {
            new ResidualStopCriterium(1e-5),
            new IterationCountStopCriterium(1000),
            new DivergenceStopCriterium(),
            new FailureStopCriterium()
        };

        // Set iterator
        solver.SetIterator(new Iterator(stopCriterias));

        // If necessary, choose a preconditioner (IncompleteLU
        // and Ilutp are slow, so try Diagonal first).

        // solver.SetPreconditioner(new Diagonal());

        var A = LoadMatrix();
        var b = LoadRhs();

        var solution = solver.Solve(A, b);

        bool success = false;

        foreach (var item in stopCriterias)
        {
            if (item.StopLevel == StopLevel.Convergence)
            {
                success = true;
                break;
            }
        }

        if (!success)
        {
            // Try another solver, a different preconditioner or
            // less restrictive stop criterias.
        }

        var residual = A * solution - b;

        // Check residual
        // double error = residual.Norm(2);

        return (Vector)solution;
    }

    private static DenseVector LoadRhs()
    {
        var b = new DenseVector(4);

        b[0] = 1;
        b[1] = 1;
        b[2] = 1;
        b[3] = 1;

        return b;
    }

    private static SparseMatrix LoadMatrix()
    {
        var matrix = new SparseMatrix(4, 4);

        matrix.At(0, 0, 4.5);
        matrix.At(1, 1, 2.9);
        matrix.At(2, 2, 3.0);
        matrix.At(3, 3, 1.0);
        matrix.At(0, 2, 3.2);
        matrix.At(1, 0, 3.1);
        matrix.At(1, 3, 0.9);
        matrix.At(2, 1, 1.7);
        matrix.At(3, 0, 3.5);
        matrix.At(3, 1, 0.4);

        return matrix;
    }
}
Dec 2, 2012 at 9:32 AM
Thank you again, Chris.

These pointers are all helpful and I'll share them with the team. Clearly a lot to learn and a few things to investigate and try.

I was especially heartened by Matt's comment "sparse direct methods usually exhaust memory before they exhaust patience.Matt Knepley" -- I'm not crazy after all!

Best,
Yvonne


On Sun, Dec 2, 2012 at 2:15 AM, wo80 <notifications@codeplex.com> wrote:

From: wo80

When solving a linear system Ax = b you can either chooser a direct solver (like LU factorization) which will compute the exact solution, or use an iterative solver, which will compute an approximate solution. Read more about the possible choices at http://scicomp.stackexchange.com/a/378

Here's some code which should explain how to use iterative solvers with Math.NET:

using System.Collections.Generic;
using MathNet.Numerics.LinearAlgebra.Double;
using MathNet.Numerics.LinearAlgebra.Double.Solvers;
using MathNet.Numerics.LinearAlgebra.Double.Solvers.Iterative;
using MathNet.Numerics.LinearAlgebra.Double.Solvers.StopCriterium;
using MathNet.Numerics.LinearAlgebra.Generic.Solvers.StopCriterium;

/// <summary>
/// Example code for using iterative solvers.
/// </summary>
public class Example
{
    public static Vector Solve()
    {
        // Parallel is actually slower, so disable!
        MathNet.Numerics.Control.DisableParallelization = true;

        // Choose your solver: BiCgStab, TFQMR, GpBiCg or MlkBiCgStab.

        // BiCgStab is usually a good choice
        BiCgStab solver = new BiCgStab();

        // Choose stop criterias
        var stopCriterias = new List<IIterationStopCriterium>()
        {
            new ResidualStopCriterium(1e-5),
            new IterationCountStopCriterium(1000),
            new DivergenceStopCriterium(),
            new FailureStopCriterium()
        };

        // Set iterator
        solver.SetIterator(new Iterator(stopCriterias));

        // If necessary, choose a preconditioner (IncompleteLU
        // and Ilutp are slow, so try Diagonal first).

        // solver.SetPreconditioner(new Diagonal());

        var A = LoadMatrix();
        var b = LoadRhs();

        var solution = solver.Solve(A, b);

        bool success = false;

        foreach (var item in stopCriterias)
        {
            if (item.StopLevel == StopLevel.Convergence)
            {
                success = true;
                break;
            }
        }

        if (!success)
        {
            // Try another solver, a different preconditioner or
            // less restrictive stop criterias.
        }

        var residual = A * solution - b;

        // Check residual
        // double error = residual.Norm(2);

        return (Vector)solution;
    }

    private static DenseVector LoadRhs()
    {
        var b = new DenseVector(4);

        b[0] = 1;
        b[1] = 1;
        b[2] = 1;
        b[3] = 1;

        return b;
    }

    private static SparseMatrix LoadMatrix()
    {
        var matrix = new SparseMatrix(4, 4);

        matrix.At(0, 0, 4.5);
        matrix.At(1, 1, 2.9);
        matrix.At(2, 2, 3.0);
        matrix.At(3, 3, 1.0);
        matrix.At(0, 2, 3.2);
        matrix.At(1, 0, 3.1);
        matrix.At(1, 3, 0.9);
        matrix.At(2, 1, 1.7);
        matrix.At(3, 0, 3.5);
        matrix.At(3, 1, 0.4);

        return matrix;
    }
}

Read the full discussion online.

To add a post to this discussion, reply to this email (mathnetnumerics@discussions.codeplex.com)

To start a new discussion for this project, email mathnetnumerics@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com




--
--
Yvonne Burgess
Chief Systems Architect
Climate Earth, Inc.
415.391.2725 (office)
415.816.2674 (mobile)
www.climateearth.com

Coordinator
Dec 3, 2012 at 6:25 PM

Hi Chris,

Thanks a lot for the help! Would it be ok for you if we include code derived from your storage adapter into the core library, and your sample code posted above into our documentation and/or code examples?

Thanks,
Christoph

Dec 4, 2012 at 8:17 AM

Sure, just go ahead...