Multiple Regression

Sep 8, 2011 at 6:23 AM

I am looking for a library to perform Multiple Linear regression with 4 independent variables.

Can any one suggest appropriate library.

Thank you,

Sep 8, 2011 at 4:41 PM

Hi,

Use the Linear Algebra library. Here's an adaption of some VB code that I wrote:

Imports MathNet.Numerics.LinearAlgebra.Double
Imports MathNet.Numerics.Distributions

Public Class MyRegression

    Public Shared Function QuickRegression(ByVal Yarray() As Double, ByVal XArray(,) As Double) As Double(,)

        'Bring in the data and put it into matrices
        Dim XMatrix As Matrix = New DenseMatrix(XArray)
        Dim YMatrix As Matrix = New DenseVector(Yarray).ToColumnMatrix
        Dim YFitted As Matrix

        'How many variables are we dealing with?
        Dim NumVars As Integer = XArray.GetUpperBound(1)

        Dim i As Integer
        Dim KeptXR As Matrix
        Dim OutputMatrix As Matrix

        'Needed for calculating T-Stats and P-Values
        Dim SumSqErrors As Double
        Dim DOF As Integer
        Dim StdD As Double
        Dim XXMatrix(,) As Double
        Dim TStats(NumVars) As Double
        Dim PVals(NumVars) As Double

        'What the function will return
        Dim OutputArray(2, NumVars) As Double

        'Using QR Factorization to solve the problem
        Dim MyQR As Factorization.QR = XMatrix.QR()
        OutputMatrix = MyQR.Solve(YMatrix)

        'Need to get (X'X)^-1 to calculate T-Stats - equivalent to (R'R)^-1 from the QR factorization of X
        KeptXR = MyQR.R
        XXMatrix = KeptXR.TransposeThisAndMultiply(KeptXR).Inverse().ToArray

        'Calculate the fitted 
        YFitted = XMatrix * OutputMatrix

        'Calculate the Sum of the Squares of the Errors
        SumSqErrors = VectSumSq(YFitted.Column(0), YMatrix.Column(0))

        'Degrees of Freedom
        DOF = YFitted.RowCount - NumVars

        'Get T-Distribution to calculate T-Stats
        Dim MyStudentsT As StudentT = New StudentT(0, 1, DOF)

        'Standard Deviation
        StdD = Math.Sqrt(SumSqErrors / DOF)

        'Calculate the T-Stats and then the P-Values of the regression
        For i = 0 To NumVars

            TStats(i) = OutputMatrix(i, 0) / (StdD * (XXMatrix(i, i)) ^ 0.5)

            PVals(i) = 1 - MyStudentsT.CumulativeDistribution(Math.Abs(TStats(i))) + _
                MyStudentsT.CumulativeDistribution(-Math.Abs(TStats(i)))

        Next

        'Put the whole lot into a single array for the function to return
        For i = 0 To NumVars

            OutputArray(0, i) = OutputMatrix(i, 0)
            OutputArray(1, i) = TStats(i)
            OutputArray(2, i) = PVals(i)

        Next

        Return OutputArray

    End Function

    Friend Shared Function VectSumSq(ByVal Vector1 As Vector, ByVal Vector2 As Vector) As Double
        'Calculates the Sum of the Squares of the difference between two vectors
        Dim TempVector As Vector

        TempVector = Vector1 - Vector2

        Return TempVector.PointwiseMultiply(TempVector).Sum

    End Function

End Class

Hope this is useful,

Andrew 

 

Sep 9, 2011 at 7:55 AM

Thank you for your response.

i was trying to use the code directly, but it seems some of the functions are obsolete in new version of library.

Hence i could only use up to calculating the Output Matrix. But these values are not matching with multiple regression output in standard packages (Excel etc).

Can you respond with your experiences of using this code.

Thank you once again,

Sampath

Sep 9, 2011 at 10:23 AM

Hi Sampath,

I'm not sure about the obsolesence of the functions - I thought I was using a reasonably up-to-date version. What functions does it not accept?

I did a quick test of my code against Excel and found that the coefficients did agree. You have to remember that in Excel the Linest function

=LINEST(KnownYs, KnownXs)

will give the coefficients in the reverse order so maybe this is where the problem arises.

Andrew

Sep 9, 2011 at 1:42 PM

Hi Andrew,

There are many functions which i have checked Highlighted as RED 

Imports MathNet.Numerics.LinearAlgebra.Double
Imports MathNet.Numerics.Distributions

Public Class MyRegression

    Public Shared Function QuickRegression(ByVal Yarray() As Double, ByVal XArray(,) As Double) As Double(,)

        'Bring in the data and put it into matrices
        Dim XMatrix As Matrix = New DenseMatrix(XArray)
        Dim YMatrix As Matrix = New DenseVector(Yarray).ToColumnMatrix
        Dim YFitted As Matrix

        'How many variables are we dealing with?
        Dim NumVars As Integer = XArray.GetUpperBound(1)

        Dim i As Integer
        Dim KeptXR As Matrix
        Dim OutputMatrix As Matrix

        'Needed for calculating T-Stats and P-Values
        Dim SumSqErrors As Double
        Dim DOF As Integer
        Dim StdD As Double
        Dim XXMatrix(,) As Double
        Dim TStats(NumVars) As Double
        Dim PVals(NumVars) As Double

        'What the function will return
        Dim OutputArray(2, NumVars) As Double

        'Using QR Factorization to solve the problem
        Dim MyQR As Factorization.QR = XMatrix.QR()
        OutputMatrix = MyQR.Solve(YMatrix)

        'Need to get (X'X)^-1 to calculate T-Stats - equivalent to (R'R)^-1 from the QR factorization of X
        KeptXR = MyQR.R
        XXMatrix = KeptXR.TransposeThisAndMultiply(KeptXR).Inverse().ToArray

        'Calculate the fitted 
        YFitted = XMatrix * OutputMatrix

        'Calculate the Sum of the Squares of the Errors
        SumSqErrors = VectSumSq(YFitted.Column(0), YMatrix.Column(0))

        'Degrees of Freedom
        DOF = YFitted.RowCount - NumVars

        'Get T-Distribution to calculate T-Stats
        Dim MyStudentsT As StudentT = New StudentT(0, 1, DOF)

        'Standard Deviation
        StdD = Math.Sqrt(SumSqErrors / DOF)

        'Calculate the T-Stats and then the P-Values of the regression
        For i = 0 To NumVars

            TStats(i) = OutputMatrix(i, 0) / (StdD * (XXMatrix(i, i)) ^ 0.5)

            PVals(i) = 1 - MyStudentsT.CumulativeDistribution(Math.Abs(TStats(i))) + _
                MyStudentsT.CumulativeDistribution(-Math.Abs(TStats(i)))

        Next

        'Put the whole lot into a single array for the function to return
        For i = 0 To NumVars

            OutputArray(0, i) = OutputMatrix(i, 0)
            OutputArray(1, i) = TStats(i)
            OutputArray(2, i) = PVals(i)

        Next

        Return OutputArray

    End Function

    Friend Shared Function VectSumSq(ByVal Vector1 As Vector, ByVal Vector2 As Vector) As Double
        'Calculates the Sum of the Squares of the difference between two vectors
        Dim TempVector As Vector

        TempVector = Vector1 - Vector2

        Return TempVector.PointwiseMultiply(TempVector).Sum

    End Function

End Class
________________
Then coming to Excel, i am looking for Regression, which is part of DataANalysis (Data Menu > DataAnalysis > Regression in MsOffice 2007).

But, Its nice to interact with you.

Thanks again.

Sampath.



Sep 9, 2011 at 3:13 PM

I think the regression in Excel just runs the Linest function to get its results, but again, the coeffs are backwards. I'll have a look at it over the next few days.

In terms of the obsolete functions I think we'll just have to appeal to one of the moderators to clarify that one...