A few years ago, I read about natural distribution of leading digits in a set of natural numbers. The normal use of this rule is to differentiate between data sets with fabricated numbers and those with real numbers.

Today, I ended up with two sets of sixteen numbers, and was curious how the leading digits were spread out.

The first data set had values between 561 and 8224. The second had values between 39 and 576. The second set was a function of the first. The leading digit frequencies were as follows:

First Digit Frequency Benford’s Law
Set 1 Set 2 Total
1 3 8 11 34.4% 30.1%
2 5 4 9 28.1% 17.6%
3 4 1 5 15.6% 12.5%
4 0 0 0 0% 9.7%
5 1 2 3 9.4% 7.9%
6 0 1 1 3.1% 6.7%
7 1 0 1 3.1% 5.8%
8 1 0 1 3.1% 5.1%
9 1 0 1 3.1% 4.6%

I was impressed with how front-loaded that table is, and how closely it tracked with Benford’s law. There doesn’t seem to be any reason for “1” or “2” to be more common than, say, “4” as a leading digit in either set, but in both cases “1” and “2” (22% of the leading digits) accounted for more than half of the leading digits (34% and 28% respectively).

Advertisements