And statistics! This is a response to part of the discussion that started in this thread on Physicist-Retired's seed about the tactics of anthropogenic global warming deniers. The contention arose at some point that, because of the highly random nature of weather, drawing long-term climate inferences from the day-to-day recording of temperature and carbon dioxide concentration is pointless. Herein I attempt to show how, using some very basic statistics, this can be shown to be absolute bollocks.

To do this, I set up a spreadsheet in Excel that mimics a cycle of seasons, of sorts. A cosine function cycles every 365 "days" between a maximum temperature and a minimum temperature. To mimic the randomness of daily temperatures, a random number generator produces a value between negative and positive 7.5 to add to the cyclical temperature. I also included a small addition to the temperature each day, calibrated to add up to a couple of degrees over the course of 50 years in the model - this is to mimic the claim of one degree Celsius (converted to about 2 degrees Fahrenheit) that climate scientists claim to have observed the globe warming by in the past 50 years. These numbers are all added up to give a daily temperature, which is what I'm plotting in graphs henceforth.

Here we have a chart representing one "month" - a 30-period interval at the start of the simulation. I reckon this would be the month of August. The temperatures look pretty much entirely random, within a few degrees of 80something. That makes sense for one month. I included a line of best fit through the data, as well as an important statistical measure called the R-squared value. The R-squared, or Coefficient of Determination, is a measure of how significant a trend is in a chart like this. It ranges from zero to one; zero means that there is absolutely no statistical relationship between the variables being tested (in this case, time and temperature) and one means that the relationship is perfect. Here it is 0.07 - quite close to zero, which is good as, at this level, it's essentially random.

Next, I made a plot over a whole year, 365 periods. Call it "1950," seeing as it's the start of the simulation. As you can see, there's a definite pattern to it: the band of random temperatures follows a quite precise curve - the effect of the seasonal variation programmed into the simulation. Real weather, of course, is more chaotic but as this is just a simulation for the purpose of demonstrating the statistical methods, it'll do. One thing worth noticing is that the R-squared value is still very low - because we are using the wrong kind of equation. The computer is still trying to draw a straight line through the data, when what it needs is a curve. Now, normally you could (if stupid OpenOffice had trigonometric best-fit functions, grr) simply fit it to a different kind of function - namely, a sine or cosine function - but because we are modeling climate, and climate doesn't necessarily repeat itself, we can use a different tool to show the trend more clearly here, called the moving average.

A moving average is calculated by averaging the last few periods of a graph such as these. For instance, if I were to set a 15-day moving average, the moving average for day 16 would be the average of the temperatures for the days 2 through 16 - 15 days in total. If I add a 15-day moving average (in red) to the year's worth of data, it looks like this:

As you can see, a lot of the noise from the randomness has been removed by applying the moving average, showing the pattern much more clearly. However, it's still a little jagged, and because I know (because I programmed it that way) that this is based on a *smooth *curve, I know that this jagged line can't be terribly close to the true pattern. When the moving average is increased, however, to 30 days, we get this:

Much smoother. However, we can start to see one problem of the moving average more clearly: it lags the actual data. Adding more days to the moving average does make it smoother, but it also causes the indicator to lag behind the actual pattern.

Moving up the timescale rather drastically, here is the graph of all 50 years of simulated temperatures, with the climate change added in:

It's total chaos. If you look closely, you can see the peaks get a little higher and the troughs get a little less low, but good old R-squared there reminds us that this dataset is worthless for drawing conclusions (if you can't see, it's to the left and says "0.002somethingsomethig"). If, however, we use a very long-scale moving average, we get this:

Bingo. R-squared is sitting pretty at 0.82 - not an excellent value, but a solid one. The trend has been made much more apparent in this version, though it is still choppy. Another thing worth noting is that while the significance of the trend has gone way up, the scale on the left has become much more narrow - in line with our programmed-in long-term change of 2 degrees Fahrenheit or so.

The point is not, of course, to prove absolutely here that climate change is happening as fast as my rough little model says it is; real climate science is based of course on actual data, of which there is a lot. The purpose here is simply to demonstrate how even very basic statistical tools can be used to take an initially chaotic dataset and glean from it underlying patterns in a way that isn't immediately apparent. Happy mathing!