This project has moved. For the latest updates, please go here.

Distribution.Sample Unit Tests

Coordinator
Nov 14, 2009 at 10:50 AM

Hi,

I want to add unit tests for the methods that sample from the distributions. Essentially I will add methods that check that the mean and standard deviation for the samples are within reasonable bounds (according to law of large numbers). However, these tests will require generating a lot (million?) samples from each distribution. This might slow the unit tests down by a few minutes. Shall I add them to our unit tests nevertheless or do we want a separate unit test project for them? (My vote goes to the former).

Thanks, Jurgen

Coordinator
Nov 14, 2009 at 12:15 PM

Hi Jurgen,

We did something like this in Iridium as well, see

http://github.com/cdrnet/mathnet-iridium/blob/master/src/test/MathNet.Iridium.Test/DistributionTests/DistributionTest.cs

Although there we actually verified that the shape somehow matches the distribution's explicit distribution function.

For Iridium it seemed to be good enough to compute 100'000 samples each. The test took a few seconds, but it never felt like an issue. If I remember right, the test framework also supports attributing tests with a kind of tag or category which then can be filtered upon (i.e. to run only certain tests in the normal build but all in nightly integrations). Hence, I vote for the former as well.

Thanks,
Chris

Nov 14, 2009 at 6:01 PM
Hey Jurgen,

We could also use pre-generated random numbers from a certified stream
(perhaps from the Diehard tests) and use them to test the
distributions (follow what Iridium does). We'd just have to create a
fake RNG that just returns the pre-gegenerate numbers. This has two
advantages - it will probably be quicker than using a real RNG and
we'll only be testing the distribution class not the RNG.

Regards,
Marcus

On Sat, Nov 14, 2009 at 3:15 PM, [email removed] wrote:
> From: cdrnet
>
> Hi Jurgen,
>
> We did something like this in Iridium as well, see
>
> http://github.com/cdrnet/mathnet-iridium/blob/master/src/test/MathNet.Iridium.Test/DistributionTests/DistributionTest.cs
>
> Although there we actually verified that the shape somehow matches the
> distribution's explicit distribution function.
>
> For Iridium it seemed to be good enough to compute 100'000 samples each. The
> test took a few seconds, but it never felt like an issue. If I remember
> right, the test framework also supports attributing tests with a kind of tag
> or category which then can be filtered upon (i.e. to run only certain tests
> in the normal build but all in nightly integrations). Hence, I vote for the
> former as well.
>
> Thanks,
> Chris
>
> Read the full discussion online.
>
> To add a post to this discussion, reply to this email
> ([email removed])
>
> To start a new discussion for this project, email
> [email removed]
>
> You are receiving this email because you subscribed to this discussion on
> CodePlex. You can unsubscribe or change your settings on codePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any posts
> to this discussion will also be available online at codeplex.com
Coordinator
Nov 14, 2009 at 9:08 PM

Thanks for all the suggestions.

I've just ported the approach used in Iridium as Chris suggested but now using our own Histogram class. In the process, I've changed the implementation of Histogram to make the bucket lowerbound exclusive and the upperbound inclusive (it used to be the other way around). However, I think this design is much more logical as it corresponds to the way CDF's are computed: CDF(a) - CDF(b) = P(b < x <= a) [computing the quantity P(b <= x < a) is less natural I think].

It turns out that our Mersenne Twister is really fast and doing 100.000 samples is not too big of an overhead. I still agree with Marcus that a pre generated set of random numbers would probably be better. I'll add it to my todo list (but first will return to do a bit of linear algebra work).

Cheers, Jurgen