A few years ago, I read about natural distribution of leading digits in a set of natural numbers. The normal use of this rule is to differentiate between data sets with fabricated numbers and those with real numbers.

Today, I ended up with two sets of sixteen numbers, and was curious how the leading digits were spread out.

The first data set had values between 561 and 8224. The second had values between 39 and 576. The second set was a function of the first. The leading digit frequencies were as follows:

First Digit | Frequency | Benford’s Law | |||
---|---|---|---|---|---|

Set 1 | Set 2 | Total | |||

1 | 3 | 8 | 11 | 34.4% | 30.1% |

2 | 5 | 4 | 9 | 28.1% | 17.6% |

3 | 4 | 1 | 5 | 15.6% | 12.5% |

4 | 0 | 0 | 0 | 0% | 9.7% |

5 | 1 | 2 | 3 | 9.4% | 7.9% |

6 | 0 | 1 | 1 | 3.1% | 6.7% |

7 | 1 | 0 | 1 | 3.1% | 5.8% |

8 | 1 | 0 | 1 | 3.1% | 5.1% |

9 | 1 | 0 | 1 | 3.1% | 4.6% |

I was impressed with how front-loaded that table is, and how closely it tracked with Benford’s law. There doesn’t seem to be any reason for “1” or “2” to be more common than, say, “4” as a leading digit in either set, but in both cases “1” and “2” (22% of the leading digits) accounted for more than half of the leading digits (34% and 28% respectively).

October 26, 2009 at 4:06 pm

This is the coolest thing I’ve seen all day. Science wins again!

October 28, 2009 at 2:46 pm

I wondered:

What are the practical repercussions of the data not following Benford’s Law? If I had some data on how long it takes me to walk to work and it didn’t follow Benford’s Law, what would it mean to me?

In lieu of doing any actual research on the topic, I’ve decided: Since non-Benfordian number sets are often fabricated, then my walk-to-work time must have been fabricated. But it can’t be, right? Because I didn’t fabricate it. There is only one logical conclusion: If you discover that you are consistently generating non-Benfordian number sets, you are a robot.

Congratulations on not being a robot, Matt.

November 8, 2009 at 6:46 pm

[…] Read more […]