Negative Samples from Geometric Distribution

Feb 7 at 5:50 PM
Edited Feb 7 at 9:36 PM
I downloaded the source (3.0 alpha?), and I'm getting negative samples in my unit tests for the Geometric Distribution. Is that intentional for some reason that I don't understand?

The code for the SampleUnchecked() method is shown below (notice the -Math.Log(...)).
        static int SampleUnchecked(System.Random rnd, double p)
        {
            return p == 1.0 ? 1 : (int) Math.Ceiling(-Math.Log(1.0 - rnd.NextDouble(), 1.0 - p));
        }
If I'm not mistaken, the NegativeBinomial distribution should return the same results as the Geometric distribution when r = 1 (for the same value of p).

If you test these two distributions side-by-side with random generators seeded exactly the same, you will see that the results are not correct.

BTW: If I use Floor instead of Ceiling, and remove the negative sign of the Log value, the results look reasonable.

Ben
Coordinator
Feb 9 at 9:51 AM
Yes, that looks wrong. Neither Geometric nor NegativeBinomial distributions are currently covered by the tests to verify the distribution of the generated samples, and when added quickly they indeed both fail (VapnikChervonenkisTest). We should extend these tests to verify all sample distributions and make sure they hold.

Note that the side-by-side test you mentioned is invalid with Math.NET Numerics. The Sample routines are only supposed to guarantee the correct distribution but not any strict mapping from the uniform samples provided by the RNG (in fact, some of them take multiple samples from the RNG to generate a single sample).

Thanks,
Christoph
Coordinator
Feb 9 at 3:35 PM
I've fixed the Geometric distribution in mainline master; the negative sign was indeed wrong, but Ceiling instead of Floor is correct. Note that there are two definitions of the Geometric distribution, one with mode 0 and one with mode 1. The one implemented in Math.NET Numerics has mode 1 (and thus will never generate a 0-sample).

The distribution of samples generated by all continuous and discrete distributions except NegativeBinomial, Poisson and ConwayMaxwellPoisson are now verified in the unit tests. Thanks again for reporting.

Thanks,
Christoph
Marked as answer by cdrnet on 2/9/2014 at 8:35 AM
Feb 9 at 8:45 PM
Edited Feb 9 at 9:53 PM
Christoph,

Thanks for getting to this so quickly!

I really like the design of your library. And it is extremely useful for my purposes.

I am designing custom compression algorithms for a specific class of time series data (Brownian Motion / Random Walk).

I would like to generalize the algorithms (as much as possible) for all .NET primitive numeric types:
  • Signed and Unsigned Integral types (perhaps even including BigInteger),
  • Single, Double,, and Decimal.
  • DateTime and TimeSpan (Ticks).
Primarily I am focused on financial market data (I work for a hedge fund). But the algorithms seem to be very beneficial for use in a variety of other domains.

One of the features I would like to see added to your "Generate" static method implementations, is the ability to overload with a "Granularity" argument. I am building this capability into my own wrappers, but many others would probably find it quite useful.

The other thing that I am building as a part of my project is a Data Modeler application (using WPF MVVM with OxyPlot). This will allow me to easily simulate various types of time series data with specific characteristics (think Monte Carlo). If this turns out well, and if it seems like something that contributors on your project might like to extend or improve, I'd be happy to donate the code and help maintain it. I'll probably design it as composite application based on Prism, so that it can benefit from a "plug-in" architecture.

I actually have more "real" sample data than I know what to do with. But when it comes to generalizing compression algorithms, the more the merrier!

Again, thanks for your great work on this project!

BTW: There is an interesting blog post by Bart De Meyer concerning OxyPlot that other MathNet.Numerics users might find helpful:
Creating Graphs in WPF using OxyPlot
Feb 9 at 9:20 PM
Christoph,

I meant to also mention an alternative way of displaying histograms in test methods (I find it useful to include output in tests to easily observe the various distribution configurations).

I basically just extended ConsoleHelper with a variety of additional methods to make the output compatible with the MSTest output restrictions (no random console cursor access). The main method for a histogram that can be displayed in test output is shown below.

(I have added a variety of other methods that embellish the output with basic statistics and whatnot, but those have nothing to do with the console output restrictions solved here).

The output for a histogram simply needs to be rotated 90 degrees to make it work reasonably well in tests:
        public static void DisplayHistogramRotate90(IEnumerable<double> data
            , int numBuckets = 20, int maxBarLength = 80)
        {
            var fmt = "{0, 6} : {1, -" + maxBarLength + "} : {2}";
            var blockSymbol = Convert.ToChar(9608);
            var hist = new Histogram(data, numBuckets);

            var maxBucketCount = 0.0;
            for (var i = 0; i < hist.BucketCount; i++)
            {
                if (hist[i].Count > maxBucketCount)
                    maxBucketCount = hist[i].Count;
            }

            Console.WriteLine("Histogram:");
            for (var i = 0; i < hist.BucketCount; i++)
            {
                var fraction = hist[i].Count / maxBucketCount;
                var dots = (int)(maxBarLength * fraction);
                Console.WriteLine(fmt, i, new string(blockSymbol, dots), hist[i]);
            }
        }
Also, I added a method to generalize the display for any particular distribution (when there is a chance of throwing a "NotImplementedException" when accessing certain properties):
        public static void DisplayDiscreteDistribution(IDiscreteDistribution dist, int numHistSamples = 100000, int numHistBuckets = 20)
        {
            const string fmt = "{0, 20} = {1, 20}";
            var divider = new string('*', 120);
            Console.WriteLine(divider);
            Console.WriteLine("{0}:\n", dist);

            try
            {
                Console.WriteLine(fmt, "CDF (location 3)", dist.CumulativeDistribution(3).ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "Prob (location 3)", dist.Probability(3).ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "ProbLn (location 3)", dist.ProbabilityLn(3).ToString("f5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "Entropy", dist.Entropy.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "MaxValue", dist.Maximum.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "MinValue", dist.Minimum.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "Mean", dist.Mean.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "Mode", dist.Mode.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "Variance", dist.Variance.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "StdDev", dist.StdDev.ToString("F5"));
            }
            catch { }
            try
            {
                Console.WriteLine(fmt, "Skewness", dist.Skewness.ToString("F5"));
            }
            catch { }
            Console.WriteLine();

            Console.Write("Samples(10): ");
            for (var i = 0; i < 10; i++)
            {
                Console.Write(dist.Sample().ToString("N5") + " ");
            }
            Console.WriteLine();
            Console.WriteLine();

            //// 4. Generate 100000 samples of the distribution and display histogram
            var data = new double[numHistSamples];
            for (var i = 0; i < data.Length; i++)
            {
                data[i] = dist.Sample();
            }

            DisplayHistogramRotate90(data, numHistBuckets);
            Console.WriteLine(divider);
        }
Obviously, there is another similar method for IContinuousDistribution types.

So my tests then can then look something like the following (with no assertions in this case):
       [TestMethod]
        public void GeometricVariationsTest()
        {
             var dist = new Geometric(p: 0.01, randomSource: new MersenneTwister());
            ConsoleHelper.DisplayDiscreteDistribution(dist, 100000, 40);

            dist.P = 0.2;
            ConsoleHelper.DisplayDiscreteDistribution(dist, 100000, 40);

            dist.P = 0.5;
            ConsoleHelper.DisplayDiscreteDistribution(dist, 100000, 40);

            dist.P = 0.8;
            ConsoleHelper.DisplayDiscreteDistribution(dist, 100000, 40);
        }
Hopefully, someone else might find this a useful way of experimenting with the distributions, without needing to create repetitive console application code.
;-)