Dr. Mainul Ahsan

This message is my attempt to correct an often misunderstood concept of statistical correlation and its relation to calculating simple probabilities. More specifically I would like to elaborate the following two points:
  • Aparthib's calculation of probability is somewhat rushed and his use of the term "certainly" is rather loose in his otherwise erudite article on design argument. In my opinion, it is a cardinal sin in probability to use such a term even when the calculated probability is something like 0.9999999999 (followed by 10 billion 9's). Alternative phrases, e.g. "almost certainly" should be preferred instead.   
  • I am completely befuddled when Mr. Ahmed argues "Rolling N dice by M people at once are not correlate events; the above statement by Mr Aparthib (rightly so) is utterly false". What has "correlation" got to do anything with Aparthib's calculations? In fact if the trials by M people were correlated, the probability would be LESS than the probability if the trials were independent (which they are). This is a common misconception among students who are force-fed a whole lot of vague ideas about correlation, regression, etc., before they begin to grasp the basic rules of probability calculation.
With the above two points in mind, let us start with the familiar example of coin flipping. Suppose a person is going to toss a pair of coins n number of times. What is the probability, p(n), of getting two heads at least once? The answer is given by the following formula:
p(n) = 1 - (3/4)^n   [Please read x^y as 'x to the power of y']
The formula above is derived by using the following simple logic: there are 4 different outcomes of a single flip of a pair of coins - two heads (HH), two tails (TT), or a head and a tail (TH, HT). (Yes, the last one constitutes two separate outcomes in the sample space because the two coins are distinct entities). From the sample space, it is easy to verify that the probability of obtaining HH in any one flip is 1/4 and the probability of not obtaining HH is 3/4. Now the probability of not obtaining an HH in the second flip is also 3/4. Therefore, the probability of not obtaining a double head (HH) in two consecutive trials is (3/4)(3/4) = (3/4)^2. We are allowed to multiply the probabilities because the two trials are independent). It follows that the probability of obtaining at least one HH in two throws is:
p(2) = 1 - (3/4)^2.
This process can be generalized further to obtain the formula for p(n).
Now consider that Joe is flipping a pair of coins 25 times. The probability that he will get at least one double head (HH) in his 25 throws is: p(25) = 1-(3/4)^25, which is approximately equal to 0.999247, a number close to unity.
What about this scenario - 25 people are each given a pair of coins and each one of them is allowed a single flip of their respective pair of coins. What is the probability that at least one of them will get a double head? The answer is exactly the same as above, i.e., 0.999247. The 25 different flips by the 25 different people are exactly as independent of each other as the 25 consecutive throws of Joe are independent (unless Joe's psychic or spiritual power starts to interfere with the outcome of his consecutive coin flips). In a nutshell, it is preposterous to assume that one person's consecutive coin flips are somehow correlated or that the separate coin flips by different people are more independent than one person's flips.
Let us now calculate the probability of getting 6666666666 (10 6's) in n rolls of 10 dice. Using the same logic as used for the coin flip experiment, the formula is:
p(n) = 1 - [(6^10-1)/6^10]^n
As aparthib proposed in his article, let us consder the case of 10 dice being rolled by each of 6X6X6X6X6X6X6X6X6X6 (i.e. 6^10 = 60,466,176) different people. The probability that at least one of them will roll 6666666666 is:
p(6^10) = 1 - [(6^10-1)/6^10]^(6^10)
        = 1 - [60466175/60466176]^60466176
       ~= 0.63212056
       ~= less than two out of three
       ~= much less than "certain"
Let us now figure out if we increase the number of people doing this experiment by 36 times. That is, 10 dice are given to each of 6X6X6X6X6X6X6X6X6X6X6X6 different people (which is equal to 6^12 = 2,176,782,336. This is still a realistic number as it is smaller than the present population of the world and may be somewhat greater than the number of believers in the design argument:). Now the probability that someone will come up with 6666666666 in his or her roll is:
p(6^12) = 1 - [(6^10-1)/6^10]^(6^12)
       ~= 0.999999999999999780477860  [I used my Windows XP calculator]
       ~= more like "almost certain"
Once again, if the same experiment is done by Joe except that he has to roll his 10 dice about 2.2 billion times, the probability that Joe will get 10 6's in any one roll is also the same as the above number. There is no more correlation or independence in Joe's separate rolls than that in the roll of 2.2 billion people of the world. They are all independent rolls. Correlations ain't got nothin to do with these calculations.
Mainul Ahsan

Published at Mukto-mona 

[Mukto-mona] [Articles] [Recent Debate] [Special Event ] [Moderators] [Forum]