Tuesday, November 15, 2016

Benford's law

I just happened to stumble upon the Wikipedia page of Benford's law  which states that
"In many naturally occurring collections of numbers, the leading significant digit is likely to be small" - To put in other words, when you gather some collection of numbers, the count of numbers that start with "1" are more compared to the count of numbers that start with "2" and so on.  In the wikipedia article, we see some examples such as the distribution of physical constants, the population of 237 countries which seem to obey the law.

Here we shall take a different data set and see how the numbers are thrown up. Taking Sachin Tendulkar's scores from cricinfo and modifying to have scores only from the matches where he scored - leaving out non-scoring or non-batting matches.

Here comes the distribution - Of all his scores, on the numbers in which these scores start with

firstDigit
 1   2   3   4   5   6   7   8   9
127  63  58  45  31  38  19  29  22

And in Percentage terms,

firstDigit
   1     2     3     4     5     6     7     8     9
29.40 14.58 13.43 10.42  7.18  8.80  4.40  6.71  5.09




which is in line with Benford's law!

The occurrence of each digit, using Benford's law is given with a probability
    p(d) = log(1 + 1/d), log to the base 10

 
As a different exercise, I want to see whether Benford's law is obeyed in other bases.

Converted the scores into octal numbers and re-configured the probability formula to take in multiple bases - and the law applies.


                                   





By the way, if you convert to binary number, there is only p(1) which is always 100% - Starting digit in a binary number is always 1!


No comments:

Post a Comment