Doin' the sciencey stuff

I'd like to do a bit of a series on how to go from hypothesis to publication. This is the absolute fundamental part of science. While it sounds easy (formulate hypothesis, do experimenty stuff, write it up, publish, repeat), there's a lot of art and technique involved. Yes, lab techniques such as pipetting need to be learned, but I have literally taught high school students to do that. The main thing I learned during grad school is how to think about a problem and how to design an experiment that can clearly and accurately answer my question. So let's start there.

Of course, you've done your reading of the existing scientific literature and have identified an important gap in knowledge. To keep things simple, we'll say that yourself and others have determined that vegetable seeds put in the ground grow into a plant. (This is about all I know about botany, which makes it a good subject to tackle here!) However, no one has identified what components of the ground are required for growth and you're interested specifically in calcium.

So you get a bunch of seeds (from the same plant to minimize the variable of different stocks), plant them in 3 identical sized pots with small differences in the soil. For example, regular potting soil (control), soil with low or no calcium (since I'm guessing it would be hard to eliminate it all together), and soil with extra calcium. Then you water and sun them identically for a month and measure how high they've grown. This will allow you to determine if calcium is necessary and also if the amount in potting soil is sufficient for healthy growth. I italicized those words as they are key to determining what is required for an event to happen.

To an inexperienced eye, this seems like a well thought out experiment. IT'S NOT. There are several major issues with it:
1. Bad time points for observation
2. Too few variables measured
3. Not repeated enough
4. Samples aren't independent

First of all, this is actually lumping together several experiments. There are several stages to plant growth; sprouting, developing roots, seedling growing and final plant growth (according to the all-knowing google). By only measuring the final size, you'd be completely missing that maybe calcium is only required for sprouting and thus the lack of growth has nothing to do with the rest of the process.

The better experiment would be to first alter the timeline. Instead of only measuring the outcome after a month, start 4 sets of identical pots (so 12 pots total). Stop 3 (one of each soil) after a couple of days to see how they sprouted, another 3 after initial root and seedling growth, and 2 other time points for the plant growth. That way, if sprouting is affected by calcium levels, the experient can be altered to see if it also affects other stages. If necessary, all the seeds can be sprouted in just water or in regular soil, then transplanted to the 3 different pots to view root and seedling growth independent of sprouting.

Furthermore, always measure multiple variables at each time point to get a complete picture of what's happening. Plant height is only one component of a healthy plant. You might miss out on how full the leaves are, how expansive the root system, how soon they flower, etc, all of which are very involved in plant growth. The more data you can collect from a single experiment, the better. It will reduce how many times you have to repeat things, thus saving money and energy. But even more importantly, this prevents bias of your results. If you're only looking for one thing, you're only ever going to find one thing. An essential part of science is trying to identify and reduce bias, because no one is exempt from it! One of these days, I'll dedicate a whole post to the many ways in which good scientists try to negate their biases. Which variables are examined should be determined in advance of the experiment as well, that way you're not relying on random observation which can bias results.

Finally, no matter how many seeds you plant in each soil type, only doing an experiment once is not ok! You need to have a sample size (n) of at least 3 (ie: 3 seeds grown in each soil), repeated on at least two separate occasions. Minimum. These repetitions are critical because sometimes crap happens that you don't realize. Maybe the first time you ran the experiment, your summer student forgot to water one set of pots for half the time. Or something out of your control happened that caused false results. Repetition normalizes these variables so you can determine the true effects of what you're testing.

Which ties in to the fourth issue, which is lack of independence between the samples. If all the seeds are in one pot and something happens to that pot, your entire data set is messed up. Or, worse (since with good technique all samples should be treated identically), one bad seed dies in a pot which somehow poisons the other seeds without you knowing. Cough up the extra few bucks and give each seed it's own area.

Great! So you've successfully grown some plants and observed some differences. Now to write up the paper and get your Nobel prize, right? Ha! Yeah, there's a lot more involved yet. On to the next post!