This project has moved. For the latest updates, please go here.

RunningStatistics: Is there a way to track only a window of the last 5 samples?

Apr 12, 2015 at 8:18 AM
RunningStatistics: Is there a way to track only a window of the last 5 samples?

MathNet.Numerics.Statistics has only Push() method, there is no Pop() method.

Let's say my window size is set to 5, and I've already sampled 5 values using Push(), I would want at that point that when I sample the next (sixth) value, it would remove the first value.

Because there is no method for removing a sample from the sampled values in a running/accumulating standard deviation, I am not able to track only the last window of the whole sampled values.

that way I could use my own logic for a fixed-sized queue and would be able to keep tracking the mean, variance and std. deviation without recalculating the whole last window of samples every time I sample a new value.

If there is no "Pop" method (to remove the first value), then is there already a feature like WindowedStandardDeviation when using RunningStatistics that I'm missing?
Apr 16, 2015 at 1:56 AM
I (as an outside observer) can't find anything like this in the code. In principle you could write a Pop() function as long as you have the x value for the item you're removing by undoing what Push does. This doesn't work for the Minimum and Maximum properties of course (I just noticed that the RunningStatistics.Combine method doesn't handle those either even though it could).

Either way, someone would have to keep a rolling queue. If you're just tracking a window of 5 samples, it's actually not expensive at all to just recompute the statistics again with each new value (it's like 20 floating point ops I'd guess compared to maybe 5 to keep a running tally).
Apr 19, 2015 at 12:04 PM
@bdodson thanks for your reply, it is actually expensive if running this for each user, and window of 5 was just an example, it may be a window of 100.

moving standard deviation / rolling standard deviation / sliding window is well known term. I wish Math.Net will include this method in the library, also with tests so I won't need to implement on my own and leave it to the pros.

ref:
http://matlabtricks.com/post-20/calculate-standard-deviation-case-of-sliding-window

http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/
Apr 20, 2015 at 6:46 AM
There is a similar request up on GitHub - https://github.com/mathnet/mathnet-numerics/issues/264 .
I tinkered around with this a bit and found that the std dev got inaccurate pretty quickly doing something similar to the links above (of course I could have screwed things up). My opinion is that we re-compute the statistics over each window. It will be slower, but more accurate. There is overlap with MathNet.Filtering and the question is how do we handle that. Perhaps we should move the discussion over to GitHub, and we can decided on the best approach.
Apr 20, 2015 at 9:13 AM
Edited Apr 20, 2015 at 9:20 AM
@cuda, thanks for your reply. thats the exact request on GitHub, but no response for 6 months.

according to this page - http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Na.C3.AFve_algorithm re-calculating is called the "Naïve algorithm".

and this method - http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Computing_shifted_data should be pretty accurate.

the difference between this approach (Computing shifted data) and regular calculation of std. dev (Naïve algorithm) results should be around 0.000000000001.

the only thing need to be added to the library is the Pop() method (or Remove()) so we could use it like a queue for calculating the desired window
Apr 20, 2015 at 9:34 AM
@amiranon, what I was suggesting was not the "Naïve algorithm," but "Computing shifted data" for each window. I'll push my poping/pushing branch to github tomorrow. It started to lose accuracy after the third update. But again, I don't think this is the way to go. Skewness and kurtosis are going to be even a bigger problem.
Apr 20, 2015 at 11:13 AM
Edited Apr 20, 2015 at 11:14 AM
doh, there was a problem with my test. The accuracy isn't that bad.
first take is at: https://github.com/cuda/mathnet-numerics/tree/moving_stats . see the MovingStatistics class
supports: mean, variance, min, and max. need to add skewness and kurtosis

Lets move the discussion to github.
Apr 22, 2015 at 7:11 AM
@cuda, Great, I am checking that out, thanks