vrijdag 28 augustus 2015

Randomness....! Is random the same as random?

Often I need a random number (as a seed for a split, or to randomly order records). Now you might think that,  if we have a randomnumber generator, you could make things MORE random if you multiply these (random * random is randomer). Let's see if this holds true by just trying! 
Say we have 20,000 random numbers between 0 and 1 we can plot the number against their frequency using a histogram (100 bins):

Now this looks pretty random, 100 bins means that the space is divided in 100 sections. So 1 bin (section) with all values between 0.00 - 0.01 , 1 bin with all values between 0.01-0.02. 

We observe that most bins have a value between 0.009 & 0.011 and all between 0.008 & 0.012. which means that there are about 0.010 * 20,000 = +/- 200 points generated with a value between 0 and 0.01. 

There are also 200 point generated with a value between 0.01 and 0.02 and so on. This indeed looks rather random.  We can also plot the index (1st number generated to 20,000th number generated) versus the value:

Again this looks al pretty random, and so it should. The whole space is filled evenly.

Now here comes the kicker: If we multiply these random numbers (the same in the same order) with another random number things start to look less random:

Again we can plot the index versus the number:

What we see is that the numbers are actually getting less random. So a given value occurs more frequently. Now let's look at the bin plot again:

Remember just now, there were approximately 200 points with a value between 0.00 and 0.01. Now we see 0.06 * 20,000 = 1200, conversely we see only 0.00005 * 20,000 = 1 datapoint with a value between 0.99 and 1.00! This is looking much less random. 

And this effects gets worse when we use 3 times random numbers multiplied:

0.165 * 20,000 = 3310 values between 0.00 - 0.01  and 0 values between 0.99 - 1.00. The highest bin is between 0.89 and 0.90 with 0.0001 * 20,000 = 2 data points. 

And 4 times random numbers multiplied…

0.33030 * 20,000 = 6606 values between 0.00 and 0.01 & again 0 in the highest bin. There is 0.00005 * 20,000 = 1 datapoint with a value between 0.86 and 0.87. (perhaps not very visible in the second plot)

So there you have it, while you might think that random * random is more random than random, it is NOT! This is stated by the central limit theorem. In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance (all will be between 0 and 1), will be approximately normally distributed, regardless of the underlying distribution. (see [1] http://www.math.uah.edu/stat/sample/CLT.html, [2] Rice, John (1995), Mathematical Statistics and Data Analysis (Second ed.), Duxbury Press, ISBN 0-534-20934-3), also [3] wikipedia, https://en.wikipedia.org/wiki/Central_limit_theorem )

1 opmerking:

  1. Thanks for sharing, nice post!

    Chia sẻ các mẹ món cá chép rất bổ cho bé thì nấu như thế nào với bài cách nấu cháo cá chép cho bé con trai cũng rất bổ dưỡng cho bé mẹ tham khảo cách nấu cháo trai cho bé hay cách nấu cháo trứng cho bé ăn dặm với bài nấu cháo trứng gà với rau gì hay cháo tim heo nấu như thế nào với bài nấu cháo tim lợn với rau gì hay trẻ sơ sinh ngủ như thế nào với bài trẻ sơ sinh ngủ ít có ảnh hưởng gì đến sức khỏe của trẻ hay không, chia sẻ các mẹ bí quyết cách nấu cháo khoai lang cho bé cho bé ăn dặm hay nấu trứng món ăn dặm cho bé với bài nấu cháo trứng gà với rau gì hay bột yến mạch có vai trò giúp bé khỏe mạnh với bài cách nấu cháo yến mạch cho bé hay cua biển nấu cháo như thế nào cháo cua biển nấu với rau gì ngon cho bé ăn dặm hay món cháo cá với cách nấu cháo cá trê cho bé nấu cháo cà thu thì với rau gì cách nấu cháo cá thu cho bé thịt bò rất bổ dưỡng cho bé với bài viết cách nấu cháo thịt bò cho bé hay ngao nghêu nấu cháo cho bé như thế nào với bài cách nấu cháo nghêu cho bé giúp bé ăn ngon, khỏe mạnh.