Sunday, October 7, 2018

Lies, Damned Lies, and Statistics

From time to time I’m reminded that I did spend (waste?) several years going to graduate school, including – in my view – a few hideous hours learning statistics. It’s not that I hate math; I just never was much good at it. My undergrad minimum requirement was algebra, and I completed the minimum. Solve for x.

Nevertheless, the stats training still comes in useful, from time to time. While wandering through my social media today, I stumbled across a report from CBS, ranking states by their rates of obesity. You can view the slideshow HERE. I studied it with passing interest, noting for example that my home state of Colorado is the least obese (22.6%). Washington, DC was included right behind Colorado at #50, and so on, down to #1 West Virginia, tipping the scales at 38.1%.

While I was looking at this report I noticed a trend which led me to ask the following question: What is the relationship between this data and the percentage of people who voted for The Donald? 

Why?

It’s not important why. Science is about questions, not excuses.

You can find the percentage of people voting for Trump on Wikipedia, HERE

The relationship of obesity rates to Trump voters – or any such relationship – can be expressed as a correlation, which basically reveals how change in the X-axis corresponds to a change in the Y-axis. As any scientist will tell you (and no media editor understands) correlations are not necessarily causal. In other words, X does not cause Y, or vice-versa.

With that in mind, let’s look at the data. Before we do, I need to confess to a small fudge: The CBS data allowed for ties. For example, Tennessee and Nebraska were tied at #14 (32.8% overweight). I disallowed ties, and gave the first state named by CBS the better ranking, that is, Tennessee #15, Nebraska #14. I don’t think it matters that much, but if you’re so anal that you must have it exactly right go pull the data and run it yourself, smart ass.

The measure of fit of a correlation (R) is between 0 and 1, where 0 is no relationship and 1 means that every change in X results in an exactly equal change in Y. In my experience, there’s a kind of background correlation where everything is related to everything else by about R=.20. Above a .50 correlation usually indicates some relationship. A .70 or above is generally strong.

The relationship of state rankings of obesity x percent of population voting for Trump?

R=.73 (0.728687783, to be picky about it).

Ta-Dah! That’s pretty good. I’ve seen worse results in some of my student’s Masters’ theses. Here’s what it looks like in a graph.



Okay, so now you want to know what it all means. Well, it means Americans are fat and Trump is allegedly President. I’m sure you could spend a few thousand words sorting through these results to draw some conclusions, and you might even make an interesting research paper out of it, but I’ll leave that to you. I offer the data up merely to demonstrate that I have a lot of time on my hands.

If you seriously want to review the raw data, email me and I’ll be happy to send it along.

No comments: