Warning Verification Pitfalls Explained – Part 1: Getting started

Several years ago (in 2005), we hosted our first user group workshop on severe weather technology for NWS warning decision making.  This workshop was held at NWS headquarters in Silver Spring, MD, and was attended by NWS management, severe weather researchers, technology specialists, and a “user group” of meteorologists from the field – local and regional NWS offices.  The main objective of this workshop, and the second one that followed in 2007, was to review the current “state of the science and technology” of NWS severe weather warning assistance tools, to identify gaps in the present methodologies and technologies, to gain expert feedback from the field (including “stories” from the front lines), to discuss the near-term and long-term future trends in R&D, and for field forecasters and R&D scientists to help pave the direction for new technological advances.  Our invitations sought enthusiastic attendees who were interested in setting an aggressive agenda for change.  We invited Dr. Harold Brooks (NSSL) to give a seminar at the workshop about how the NWS might go improving the system it uses to verify severe weather warnings.  Many of the ideas I will present were borne out of Harold’s original presentation, and I will build upon them.

So, to start, let’s look at how NWS warnings are verified today.  As many of you know, the NWS transitioned to what is now known as polygon-based warnings about four years ago.  Essentially, this means that warnings are now supposed to be drawn as polygons that represent the swath in which the forecaster thinks severe weather will occur during the duration of the warning, without regard to geo-political boundaries.  In the past, warnings were county-based, even though severe storms don’t really care about that!  It’s better to call the system “storm-based warnings”, since after all, counties are just differently-shaped polygons.


But what really changed was not the shape of the warnings, but how warnings were verified.  No longer was it required that each county covered in a warning received at least one storm report.  Now, only one storm report is required to verify a single polygon warning.  This sounded attractive since it meant that if a small storm-based warning touched several small portions of multiple counties, that there was no need to find a report in each of those county segments, reducing the workload required to gather such information.  But, as I will show, there are flaws in that logic.

How are forecasts verified for accuracy?  One of the simplest ways to do this is via the use of a 2×2 contingency table.  Each of the four cells in the matrix are explained:  A verified forecast is called a HIT (cell A), and represents when an event did happen when the forecast said it would happen.  An unverified forecast is called a FALSE ALARM (cell B) when a forecast was issued, but an event did not happen.  An event that went unforecasted is called a MISS (Cell C).  And finally, wherever and whenever events did not happen, when there was a forecast of no event (or no forecast at all), that is called a CORRECT NULL (Cell D).


Also from this table, one can derive a number of accuracy measures.  The first is called Probability Of Detection (POD), which is the ratio of verified forecasts (HIT) to the number of all forecasts (HIT + MISS).  Another is the False Alarm Ratio (FAR) or Probability Of False Alarm (POFA), which is the ratio of false forecasts (FALSE ALARM) to all forecasts of an event (HIT + FALSE ALARM).  Finally, one can represent the combination of both POD and FAR into the Critical Success Index (CSI), which is the ratio of HIT to the sum of all HIT, MISS, and FALSE ALARM.  CSI can be written both as a function of A, B, and C, and through algebraic manipulation, a function of POD and FAR.


In the next post, I will explain how NWS warnings are verified today, and how they use the 2×2 table and the above measures to derive their metrics.

Greg Stumpf, CIMMS and NWS/MDL

Tags: None

Experimental Warning Thoughts – Intro

Hello readers!

This is the first post of an indefinite series dedicated to my thoughts about short-fused warnings for severe convective weather, namely tornadoes, wind, hail.  My purpose for these blog posts is to express some of my observations and ideas for improvements for the end-to-end integrated United States severe weather warning system.  These include the decision-making process at the National Weather Service (NWS) weather forecast office (WFO) level, the software and workload involved with the creation of hazardous weather information, the dissemination of the information as data and products, and the usage and understanding of the information by a wide customer base which includes the general public, emergency managers and other government officials, and the private sector which has the ability to add specific value to the NWS information for their customers with special needs.

Many of these thoughts will originate with me, but others will be derived by my colleagues who will be acknowledged whenever possible.  In fact, I see no reason why guests should not be welcome to post here as well.  I also hope that these blog posts will generate discussion, either here in the form of comments, or elsewhere on various fora and email list servers.  It is my desire that the information and dialog generated here will be considered by weather services management toward a goal of continually improving the way weather information services are provided by the government and their public and private partners.  Some of what I will present might be controversial, and I reserve the right to change my opinion at any time when a convincing argument is presented, since after all, I’m not perfect.  I’m also not much of a writer, but there are a lot of us out there in the blog world today, so why not?

There are some that think the NWS warning services are working wonderfully.  One could state fabulous accuracy numbers, such as a very low number of missed severe weather events and some very respectable lead times on major tornado events.  One could show that the severe storm mortality rate is quite low in the U.S., well, maybe perhaps before 2011.  Do we just chalk up the record death counts of this year to just bad luck and unfortunate juxtaposition of tornadoes and people, or is it possible there is continued room for hazard information improvement?  I’m going to try to make some points that our perception of how accurate our warnings are is based on some flawed premises in the way warnings are verified.  I’m also going to present some concepts that were born out of discussion and experimentation at the NOAA Hazardous Weather Testbed (HWT) with many colleagues at the National Severe Storms Laboratory (NSSL) and the NWS that show some promise toward improvement.

So, without further adieu, I will follow this with another post, and see how things proceed from here.

Greg Stumpf, CIMMS and NWS/MDL