One of the most frequently asked questions about tornadoes during 2008 was “How many tornadoes have there been this year?” As with many of the questions we get, it sounds like a simple question that should have a simple answer but, unfortunately, it’s not.
First, it’s important to understand how tornado information is collected. Reports come into local National Weather Service Forecast Offices from a wide variety of sources (trained spotters, emergency personnel, the media, the general public.) Those reports get put out relatively quickly and form the basis of the preliminary, so so-called “rough”, log of tornadoes. As time goes on, more reports may come in, increasing the count, but the local Warning Coordination Meteorologist (WCM) may also find that some of the reports are of the same tornado, lowering the count. Within a couple of months of the end of the month, the WCM submits a list of tornadoes that the National Climatic Data Center and the National Weather Service’s Storm Prediction Center’s WCM go through and make sure that tornadoes that cross from one office’s area of responsibility into another office’s area only get counted once and do some quality control. Eventually, this produces the final, or “smooth”, log of tornadoes. Asking the question “How many tornadoes have there been this year?” right after an event gives us the problem of only having the preliminary data avaiable. Answering someone’s question with “Ask us again in a few months” is not usually seen as satisfactory by the questioner.
The approach we’d like to take is to look at the historical relationship between preliminary reports and final reports. In general, our naive expectations would be that there might be increases from the preliminary log to the final log on days with very few, if any preliminary reports, and there might be decreases on days with long-track tornadoes that might be reported many times. Which effect wins out, and by how much?
In the spring of 2008, we started looking at this question in detail and started by looking at monthly and annual totals. The ratio of the final total to the preliminary total for each month from January 1998 to November 2008 with at least 30 preliminary reports is shown in red in the figure below. A running annual total for the next twelve months is shown in black.
The result was surprising. A rather sudden change took place in early 2006. Prior to that, the final log tended to have about 10% more tornadoes than the preliminary log. The ratio had, perhaps, trended slightly downward, but starting in March 2006, it dropped precipitously, such that for 2007 and 2008, there were 20% fewer tornadoes in the final log than in the preliminary log. This, obviously, was a huge change and meant that our intuition of what the preliminary log meant in terms of the number of tornadoes was wrong.
We then looked at the differences on a day by day basis. The figure below shows the change from preliminary to final tornado count as a function of the preliminary count. As we might expect, there’s not much difference on the “small” days and some of those have more in the final than in the preliminary and some have fewer. For “big” days, the final log tends to have fewer. We broke things down in the periods 1998-2005 and 2006-2008. Linear regression lines are shown as dashes for each of the two periods. For the early period, for 100 preliminary reports, the final total, on average, was about 10 lower. For the later period, it was about 30 lower. Only once in that three-year period did a preliminary count of at least 40 end up with more in the final count. In the earlier period, that was a common event. The effect on the 2008 totals was that, instead of adding about 200 tornadoes from the preliminary to the final totals, you had to subtract more than 400!
We admit we’re not entirely sure what happened in March 2006. A number of small changes in software for collecting reports were implemented about then, but it’s not clear if that’s enough to explain what happened. We plan to monitor the situation and use the regression from the recent data to estimate the final count, based on the preliminary count.
This is another example of how challenging it can be to deal with what seems to be a simple dataset. Caveat lector!