I must disagree. At least in the case of Benford's Law, one man was really over-analyzing and trying to extract some judgment from his data and found connections where there should be none, in my view.
Apparently, in numerical data, the leading digit of each data element should fall into a certain distribution, where '1' is more often than '2' to appear, and '2' is more often than '3' to appear, and so on, until '9', at which point '9' should be < 5% and '1' should be approximately 30%.
Anyways, the thought intrigued me, as I never would have considered '1' a more popular number in data (regardless of units) than '9'. The math is easily explained by Wikipedia (as always) but, what really draws me is the data crunching! (Which Wikipedia cannot easily show). I decided to choose GDP per Capita of nearly all the countries in the world (some are not available--North Korea is an obvious example) which is readily available from Mathematica's data, and wapow! Generated several noteworthy graphs:
I extracted some
Following the data extraction of the
With data purification, all we need is now the lead digit. Suffice it to say that there's a bit of
Finally, I just
Short and Simple!
If interested in the BarChart generator:
This code was comparatively easier than that of my former pursuits (Dandelion Cellular Automata, GDP per Capita related to Latitude, Zipf's Law, etc.) so I got creative with coloring the graphs above!
Back to what matters:
The Code
I extracted some
CountryData
. Using Map
, I found the "GDPPerCapita"
for all countries in CountryData[]
. This processing took very little time, fortunately! There are 239 countries in Mathematica.
Following the data extraction of the
GDP,
we face the trouble of: "Missing[NotAvailable]"
. We weed out the little buggers with Select
and, *drumroll* (this took quite a bit of trouble to find) StringFreeQ
with which we also specify in parameters that we are looking for "Available"
which is the string portion not present in the numerical data we look for. There are now 231 countries in our GDPPerCapita data list.
With data purification, all we need is now the lead digit. Suffice it to say that there's a bit of
Floor
ing and a use of IntegerLength
, but altogether, simple to do with Map
.
Finally, I just
Tally
the lead digits, SortBy
the First
element, Map
again to get the first value of each sublist, then BarChart
it!
Short and Simple!
If interested in the BarChart generator:
You guessed it!
Voila!
The actual Bar Graph:
The Data works out so that--in numerical order from 1-9: {0.25974, 0.160173, 0.125541, 0.142857, 0.103896, 0.0692641, 0.0519481, 0.0519481, 0.034632}. Benford's Law Holds! 1 is approx 30%, and 9 is < 5%, and the GDP per Capita's of almost all the countries in the world (minus 8 rebels) decreases appropriate to the Benford Distribution (except 4...)
Map
it with another parameter for BarChart
for ChartStyle -> #
. Again, the #
signifies the entry point for the various elements to loop through, which would be included under: ColorData["Gradients"]
.
Voila!
The actual Bar Graph:
The Data works out so that--in numerical order from 1-9: {0.25974, 0.160173, 0.125541, 0.142857, 0.103896, 0.0692641, 0.0519481, 0.0519481, 0.034632}. Benford's Law Holds! 1 is approx 30%, and 9 is < 5%, and the GDP per Capita's of almost all the countries in the world (minus 8 rebels) decreases appropriate to the Benford Distribution (except 4...)
A Benford Distribution should follow approx: {0.301, 0.176, 0.125, 0.0907, 0.079, 0.067, 0.058, 0.051, 0.046}