What's in a graph

I have a confession to make. I am a graph addict. A data junkie. I make no apologies for this, it is just the truth.

I realized this a few weeks ago when I came across a website that charts all the causes of death across age and separates it by gender or ethnicity. This results in a glorious, interactive, stacked area graph that I can get lost in for hours. Ok, maybe not hours. But I seriously spent 20 minutes playing with it the first time I clicked the link. I think I'm in love.

Which leads me to this month's post. I want to share my love of graphs by teaching you where to look for the interesting tidbits of information. If you only focus on what the author is pointing at, you'll miss out on lots.

Sometimes you can find good things, like an intriguing difference that might spawn more research questions. But sometimes you can discover the error in the research method or analysis that makes you doubt the study validity. This is the type of critical thinking that it takes years, no decades, to fine tune. In fact, there's a great article by the same people who made the death chart with rules to follow when creating any graph type imaginable. It's worth checking out.

So without further ado, I present to you, my newest research project: Things That Make Me Happy.

As you can see, I have graciously charted out my level of happiness with various activities for you. Clearly, everyone is yearning to know these important details about me, whether I prefer watching TV, yoga or running. But let's not be too hasty, we'll take it step by step.

1. Check the y-axis A lot of graphs you'll see around are bar graphs like this and it's absolutely critical that you check the y-axis (the vertical one). The categories on the bottom are important too, but you can't understand what you're looking at until you at least know what data is being measured.

Here, we're measuring my happiness, using how often I smile as a proxy. I know, smiles per 10 min is an odd measurement. But I had already made up the numbers without thinking about units and didn't want to have to change it!

But that right there shows how important looking at the y-axis is. Weird units or (the absolute worst) no y-axis information at all are big red flags that something's wrong with this data. So yeah, I totally did that on purpose.

2. Look at the x-axis Now that you know what we're measuring, check out what categories the data is separated by on the horizontal axis. Here's where you can start thinking about positive and negative controls. These are generally listed first in the graph, before the test groups.

In this example, I used going to the dentist as the negative control. You'd expect this group to be consistently low, which it is. This demonstrates that you can measure low numbers, as in I'm not just a perpetually happy person who never stops smiling.

Likewise, eating ice cream is the positive control, which is consistently high in happiness. I mean, who doesn't like ice cream?? Crazy people, that's who! This shows that the test is able to measure high levels of smiling and can see differences between low and high.

3. Evaluate the error bars I purposefully haven't talked about the test groups up to this point. I know you're all anxious to know the results, but it's key to have established the model first. Otherwise you can't trust that you really know me!

So now we evaluate the quality of the data, by looking at how consistent it is. That is, each time I go to yoga, do the researchers measure about the same number of smiles? That's shown by the error bars; the tees sticking out of the blocks. Big error bars = big errors.

Don't believe me? Good, you're being skeptical, my favourite trait! Look at that same data, as a dot plot.

In this type of graph, each time the happiness is measured, it appears as a single dot. So you can see that I ate eight ice cream cones and the data was between about 7 and 11 smiles/10 min. That's pretty consistent.

Now look at watching TV. Eight measurements again, but the error bar is much longer due to the wide spread of data points. It almost looks like that group could be split into two groups. Maybe I wasn't watching the same program each time or not all of the episodes were good? This is why error bars are important.

4. Find out the sample size In the dot plot, you can easily see how many times each category was measured. Not so in the bar graph. That's why it's usually listed somewhere what the sample size (called n) is for that experiment.

However, it can be misleading. For this study, I'd say n = 4-8, but you can see that only one group has only 4 dots: running. I really dislike running. Or maybe I don't! You can't really tell with that few measurements. There's one time that I really liked it, so maybe that's the true value and the other times I was just in a bad mood to start with. You won't know for sure unless you increase those numbers.

Spoiler alert: I actually hate running. And the data is completely fabricated.

5. Mathematicize the statistics Not a word. I'm just trying to make boring stats sound cool.

The last, very important step is to see if the differences you are actually significant. They could look different enough to be interesting, but not actually be statistically significant. Or they could be statistically significant, but not pass what my PhD supervisor called "the bloody obvious" test. If it's not bloody obvious that there's a difference, it's probably not physiologically relevant.

Most graphs try to keep it simple by only putting up the important comparison; here, that would be is it higher than the negative control (dentist). If so, that bar get a pretty little asterisk, as you can see in the first picture I posted.

The problem is that this method leaves out a lot of information. For example, does yoga make me less happy than ice cream? Another way of showing the data is messy, but complete and thus not biased in what the reader is being shown.

The lines connect the bars that have a statistically significant difference. Hard to interpret right? That's why I prefer the first way, with details in the text about what other differences were seen.

So there you have it. I love ice cream and yoga, hate running and the dentist, and have mixed feelings about TV. Everything you could have ever wanted to know about me. You are welcome.