Verification

One of the many struggles with forecasting is verification, especially of “rare” events. In the severe storms world, we have storm reports. It has been shown that there are serious flaws with this database over time. They are conditional upon hitting something or someone; population density or highways. There are many areas un-accessible that may have had things like hail or high wind but yet there are no observations nearby. In the Plains this is a big challenge.

Over the course of the HWT EFP, we have debated over the so-called practically perfect methodology. At is core it is a Kernel Density Estimation (KDE) technique using a gaussian smoother (120 km) and a radius of influence technique (40 km) to map individual storm reports to a grid and produce probabilities of severe.

Given that storm reports are not the most ideal, independent, non-biased dataset out there we look to other more unbiased data. So what are the alternatives? Can we use severe storm warnings? How about data specifically from the radars like Max Estimated Hail Size or Rotation tracks? How about satellite derived data about vegetation?

Just about all of these data sets have their own problems. For the radars, we are observing rotation or hail aloft, not necessarily at the ground. This is still valuable information. But how do we switch from spotty storm reports to continuous tracks? Will the same KDE smoothing approaches be necessary?

For the warnings, it is clear that meteorology alone is not driving them. If there is a chance a storm could be severe over a highly populated area at a critical time, the edge goes to issuing warnings as opposed to not. This is not all that bad, since we would all like to err on the side of safety. Better to be safe than sorry.

Using  radar data we still have to verify that what the radar detects is actually occurring at the ground and that phenomena is as strong/large as indicated aloft. And that requires doing verification on the observations. The SHAVE folks at NSSL-OU are trying to do exactly that as are some other NWS associated folks, though at the moment their name escapes me.

Satellite data also offers some advantages on tracks of severe storms provided there is damage say from large hail stripping vegetation bare or tornadoes doing damage.  Collecting more fine resolution data is going to take a dedicated effort but in the end it helps build more complete knowledge about storms, more understanding about the successes and failures of the forecasts, and quite possibly will end up making better forecasts.

Tags: None

Thanks to TAMU for soundings

One of the observation components of this years EFP has been an intercomparison between the Vaisala RS92 and InterMet radiosondes to help verify the Microwave Radiometer that we (Dave Turner at NSSL) have on the roof of the National Weather Center. We were lucky to have Don Conley from Texas A & M bring his observations class on the road and visit the HWT conduct some local and mobile radiosonde intercomparisons. They did two mobile deployments: one in Concordia, KS on Wednesday and Altus, OK on Friday for helping to verify the models we are using for convection initiation.

They drove from Norman to Concordia and were able to make 3 launches (4 really), and 2 trips to Walmart (for helium and then to return said helium). Many thanks to the City of Concordia and the airport manager for allowing them to use a hangar for these balloon launches in very strong winds (which caused the failure of the very 1st balloon launch). They got to a great spot just east of 2 very long and robust Horizontal Convective Rolls both of which produced CI along the front-HCR intersection.

I haven’t heard the stories from today, but I do know they got to Altus after lunch at Meers (for the Meers burger, obviously) and got off two launches again in an environment characterized by HCRs. These are great tests of the instruments, great experiences for the students, and excellent learning opportunities for the rest of us.

They (and you) should know that these soundings make their way to the SPC (something that is usually done upon request at TAMU) and prove valuable. These types of partnerships, sometimes ad hoc, but almost always mutually beneficial are what make the HWT a vibrant place for forecasting, research, research forecasting and forecasting research; and now with observations!

You can find the mobile and local soundings here.

Again, Thanks Don, Mike D., Mike C. and the whole Observations Class (Send me your names and we can make you famous* write them on here!).

*Fame not guaranteed.

Tags: None

Severe winds

Well today the severe weather forecast area was from eastern CO and western KS. Although the hi-res models were producing some intense precipitation cores it was highly unlikely the atmosphere would follow suit. The deep dry boundary layer would be capable of supporting some serious downdrafts but without a lot of moisture, wet downbursts were unlikely. However, dry thunderstorms were the possible convective mode and all available evidence hinted at a line that would fizzle later in the evening.

Severe reports from KS showed up just prior to 0000 UTC right along with an OK mesonet wind gust to 58 mph. Now normally when you get a downdraft, it brings cooler air to the surface. In the case of a deep and dry boundary layer, there can be little in the way of temperature changes. This is exactly what occurred with the temperature going slightly up 4F during the big wind gust!

GOOD.met

I went out on a limb and also proceded to forecast a 20 percent chance of heatbursts. A few models hinted at that possibility tonight (in the next few hours anyway). Verification in the morning!

Tags: None

Moisture bias?

So after much discussion today about moisture there is accumulating evidence. Here is a comparison between the MWR on the roof (1.6cm at 00 UTC 5/17 and 2cm at 00 UTC 5/18) , the IPW from the GPS system (gpsmet.noaa.gov), and the radiosonde launch from Norman.
GPS:

Here is the sounding:

OUN

Note the PW is 0.5″ with a characteristic decrease immediately off the surface. Just after this time the dew point at the NWC increased from 53 to 55 in the hour after release.

Below is the MWR PW time series with the last vertical dotted line at the right being 0000 UTC on 5/18 (the beginning of the chart is 5/17 at 00 UTC):

Snapshot 2012-05-17 20-09-38

The MWR agrees well with the GPS system and the MWR is fresh off a calibration. I have no idea what would cause this discrepancy with the radiosonde data but clearly something is amiss.

Let me back up to DVN on the 16th at 0000 UTC for a counter example:

And now the corresponding GPS data from Rock Island:

Note the close correspondence for a few balloon launches; specifically at 5/16 00 UTC! Yet that decrease above the surface looks suspect. It is possible the cloud layer above 700 mb is laying a role but what I don’t think we can diagnose.

Tags: None

Where did all the moisture go?

The models stole it all!

One of the additions this year is an observational program (brought to you by NSSL) where we have an Microwave Radiometer (MWR) offering vertical profiles of moisture and temperature over the lowest 4km AND a new radiosonde intercomparison to go along with it (a Vasaila RS92 sonde and a new IMET sonde). The goal is to use the soundings to compare the MWR and thus offer a calibration data set, but also to see how well the moisture retrievals compare to actual, in clear-sky conditions.

Since we have had two sonde launches this week so far we can see how well the IMET and Vasiala measurements compare and also how the MWR compares to both of them. So far the results are that the two sondes are very close. The comparison with the MWR was at first horrible, until the MWR was re-calibrated. Sensor drift from one of the eight channels was to blame at least for the low level structure of the moisture (most noticeable in the RH field). for the next point we have to understand that the majority of the information content in the moisture channels is contained below 4km and only amounts to about (on average) 1.6 pieces of information. This means within the lowest 4 km we have at most 2 effective measurement points for moisture. So the vertical profiles tend to be smooth and more like an average which is why agreement aloft between moist layers and the MWR is essentially low.

OK has been pretty low on moisture the last couple weeks. We spent some time forecasting in the Iowa area the other day where model precipitable water was in excess of 1″, but in reality was only around 0.5″. This posed no problems for the models to generate storms along a cold front associated with an upper low. Some warnings resulted from MI arcing through IA that evening with a few wind and hail reports, but generally good CI and SVR forecasts.

At issue was this anomalous model moisture and if/how it would play a role in the forecast. So many models had storms though that it was hard to discount storms even in this lower moisture environment. Even looking back at the verifying soundings from DVN, ILX, DTX it was evident that the CAPS ensemble control member was 2-3 g/kg too moist through the depth of the boundary layer. How can we have errors that large and yet have some skill in the convective forecasts? The models simulated storms were a bit on the high side in terms of reflectivity, lasted a bit longer, were a bit larger, but still had similar enough evolution. I guess you could consider this a good thing, but this should really drive home the point that better, more dense moisture observations are needed.

We really need to see WHY these errors occur and diagnose what is contributing to them. In this case, the control member was an outlier in terms of the overall ensemble. Why? Was it the initial conditions, lateral boundary conditions, the perturbations applied to the ensemble members, or some combination thereof that set the stage for these differences in convection? Or was it the interplay between the various model physics and all previous factors? We will need to dig deep on this case to get any kind of reasonable, well constrained answer.

In order to address, at least partially, these issues we need observations of moisture within the PBL. In fact we could even benefit from knowing the boundary layer. It is at least plausible to retrieve that field and then derive the PBL moisture. Such is the goal of the MWR type of profiler: to derive the lowest layer moisture structure, addressing at least some of our issues. Regardless, these high resolution models REQUIRE observations to verify both the processes and statistics of these models in order to make improvements.

Tags: None

End of week 1

I am really behind on the blog posts. Last week had some challenges especially for severe storms down in south Texas. We have had a few days where the cutoff lows have been approaching south Texas providing sufficient vertical shear and ample moisture and instability. The setup was favorable but our non-met limitation was the border with Mexico. We don’t have severe storm reports in Mexico nor do we have radar coverage. Forecasting near  a border like this also  imposes a spatial specificity problem. In most cases there is room for error, room for uncertainty especially with making longer (16 hr) spatial forecasts of severe weather. On one particular day the ensemble probabilities were split: 1 ensemble in the US with extension into Mexico, 1 ensemble in Mexico with extension into the US, and another further northwest split across the two unevenly into the US. 

So the challenge quickly becomes all about specificity … where do you put the highest probabilities and where are the uncertainties large (i.e. which side of the border).  The evolution of convection quickly comes into question also since as you anticipate where the most reports might be (where will storms be when they are in the most favorable environment), you have to also account for if/when storms will grow upscale, how fast that system will move and if it will also be favorable to generate severe weather.

We have discussed this in previous experiments as such: “Can we reliably and accurately draw what the radar will look like in 1, 3, 6, 12, etc hours?”. This aspect in particular is what makes high-resolution guidance valuable. It is precisely a tool that offers what the radar will look like. Furthermore, an ensemble of such guidance offers a whole set of “what if” scenarios. The idea is to cover the phase space so that the ensemble has a higher chance of depicting observations. This is why taking the ensemble mean tends to be better (for some variables) than any individual member of an ensemble.

 Utilizing all these members of an ensemble can become overwhelming. In order to cope with this onslaught (especially when you 3 ensembles with 27 total members), we create so-called spaghetti diagrams. These typically involve proxy variables for severe storms. By proxy variables I mean model output that can be correlated with the severe phenomenon we are forecasting for. This year we have been looking at simulated Reflectivity, Hourly maximum (HM) storm updrafts, HM updraft helicity, and HM wind speed. Further, given the number of ensembles, we have so-called “buffet diagrams” where each ensemble is color coded but now depicts each and every member. We have also focused heavily on probabilities for each of the periods we have been forecasting for.

In this case all the probabilities are somewhat uncalibrated. Put another way the exact value of the probabilities do not corresponding directly to what we are forecasting for nor have we incorporated how to map them from model world to the real world. In one instance we do have calibrated ensemble guidance but not for the 2 other ensembles. It turns out you need a long data set to perform calibration for rare event forecasting like severe weather.

Lets come back to the forecasts. Given that each ensemble had a different solution it was time to examine if we could discount any of them given what we thought was a likely scenario. We decided to remove one of the ensembles from consideration. The factors that led to this decision were a somewhat displaced area of convection that did not match observations prior to forecast time, and a similar enough evolution of convection. It was decided to put some probabilities in the big bend area of Texas to account for early and ongoing convection. This was a relatively decent decision as it turned out.

This process took about 2 hours and we didn’t really dig into the details of any individual model with complete understanding. Such are the operational time constraints. there was much discussion on this day about evolution. Part of the evolution was the upscale growth (which occurred) but also whether that convection produced any severe reports. Since the MCS that formed was almost entirely in Mexico, we won’t know whether severe weather was produced. Just another day in the HWT.

Tags: None

Getting started

Today was an interesting day as we had a joint decision to pick our domain of where we would collectively issue our forecasts. It was decided the clean slate CI and severe domain would be in south Texas. According to the models this area would potentially result in multiple types of potential severe weather (outside the stronger flow pulse severe was possible and further north in the frontal zone area, possible behind, where the flow and shear were stronger, more organized threat) as well as multiple triggers for CI (along the cold front moving south, the higher terrain in Mexico into NM, as well as potential along the sea breeze near Houston).

It was increasingly clear that adding value, by moving from course to high temporal resolution, is difficult because of how accurate we are requiring the models to be. The model capability may be good by simulating the correct convective mode and evolution, but getting that result at the right time and in the right place, will still determine the goodness of the forecast. So no matter the kind of spatial or temporal smoothing we apply to derive probabilities we are still at the mercy of the processes in the model that can be early or late and thus displaced, or displaced and increasingly incorrect in timing. This is not new mind you, but it underscores the difference between capability and skill.

In the forecast setting, with operational timeliness requirements, there is little room for only capability. This is not to say that such models don’t get used, it just means that they have little utility. The operational forecasters are skilled with available guidance so you can’t just put models with unknown skill in their laps and expect it to have immediate high impact (value). The strengths and weaknesses need to be evaluated. We do this in the experiment by relating the subjective impressions of participants to objective skill score measures.

And we do critically evaluate them. But let me be clear. Probabilities never tell the whole story. The model processes can be as important to generating forecaster confidence in model solutions. This is because the details can be used as evidence to support or refute processes that can be observed. Finding clues for CI is rather difficult because the boundary layer is the least well observed. We have surface observations which can be a proxy for boundary layer processes, but not everything that happens in the boundary layer happens at the surface.

A similar situation happens for the severe weather component. We can see storms by interrogating model reflectivity but large reflectivity values are not highly correlated with severe weather. We don’t necessarily even know if the rotating storms in the model are surface based which would yield a higher threat than say elevated strong storms. Efforts to use additional fields as conditional proxies with the severe variables are underway. These take time to evaluate and refine before we can incorporate them into probability fields. Again these methods can be used to derive evidence that a particular region is favored or not for severe weather.

Coming back to our forecast for today there was evidence for both elevated storms and surface based organized storms, and evidence to suggest that the cold front may not be the initiator of storms even though it was in close proximity. We will verify our forecasts in the morning and see if we can make some sense out of all the data, in the hopes of finding some semblance of signal that stands out above the noise.

Tags: None

2012 HWT-EFP

Today is the first official day of the Hazardous Weather Testbed Experimental Forecast Programs’ Spring Experiment. We will have two official desks this year: Severe and Convection Initiation. Both desks will be exploring the use of high-resolution convection-permitting models in making forecasts which include on the severe side, the total severe storms probabilities of the Day 1 1630 convective outlook and then 3 forecast periods similar to the enhanced thunder (20-00, 00-04, and 04-12 UTC), while on the CI side they will make forecasts of CI and convection coverage for 3 four periods (16-20,20-00, 00-04 UTC).

We have 3 ensembles that will be used heavily: the so-called Storm Scale ensemble of opportunity (SSEO; 7 member including the NSSL-WRF, NMM-B Nest, and the hi-res window runs including 2 time lagged members), AFWA (Air Force 10 member), and SSEF (CAPS 12 member).

We will be updating throughout the week as events unfold (not necessarily in real time) and will try to put together a week in review. Let the forecasting begin.

Tags: None