Certainty, doubt, and verification

Today’s forecast on CI focused on the area from northeast KS southwest along a front down towards the TX-OK panhandles. It was straightforward enough. How far southwest will the cap break? Will there be enough moisture in the warm sector near the frontal convergence? Will the dryline serve as a focus for CI, given the development of a dry slot present just ahead of the dryline along the southern extent of the front and a transition zone (reduced moisture zone)?

So we went to work mining the members of the ensemble, scrutinizing the deterministic models for surface moisture evolution, examining the convergence fields, and looking at ensemble soundings. The conclusion from the morning was two moderate risk areas: one in northeast KS and another covering the triple point, dryline, and cold front. The afternoon forecast backed off the dryline-triple point given the observed dry slot and the dry sounding from LMN at 1800 UTC.

The other issue was that the dryline area was so dry and the PBL so deep that convective temperature would be reached but with minimal CAPE (10-50 J kg-1). The dry LMN sounding was assumed to be representative of the larger mesoscale environment. This was wrong, as the 00 UTC sounding at LMN indicated an increase in moisture by 6 g/kg aloft and 3 at the surface.

Another aspect to this case was our scrutiny of the boundary layer and the presence of open-cell convection and horizontal convective rolls. We discussed, again, that at 4km grid spacing we are close to resolving these types of features. We are close because of the scale of the rolls (in order to resolve them they need to be larger than 7times the grid spacing) which scales with the boundary layer depth. So a day like today where the PBL is deep, the rolls should be close to resolvable. On the other hand, there is a need for additional diffusion in light wind conditions and when this does not happen, the scale of the rolls collapses to the scale of the grid. In order to believe the model we must take these considerations into account. In order to discount the model, we are unsure what to look for besides indications of “noise” (e.g. features barely resolved on the grid, scales of the rolls being close to 5 times the grid spacing).

The HCRs were present today as per this image from Wichita:

ict_110608_roll

However, just because HCRs were present does not mean I can prove they were instrumental in CI. So when we saw the forecast today for HCRs along the front, and storms developed subsequently, we had some potential evidence. Given the distance from the radar, it may be difficult if not impossible, to prove that HCRs intersected the front, and contributed to CI.

This brings up another major point: In order to really know what happened today we need a lot of observational data. Major field project data. Not just surface data, but soundings, profilers, and low level radar data. On the scale of The Thunderstorm Project, only for numerical weather prediction. How else can we say with any certainty that the features we were using to make our forecast were present and contributing to CI? This is the scope of data collection we would require for months in order to get a sufficient amount of cases to verify the models (state variables and processes such as HCRs). Truly an expensive undertaking, yet one where a number of people could benefit from one data set and the field of NWP could improve tremendously. And lets not forget about forecasters who could benefit from having better models, better understanding, and better tools to help them.

I will update the blog after we verify this case tomorrow morning.

Wrong but verifiable

The fine resolution guidance we are analyzing can get the forecast wrong yet probabilistically verify. It may seem strange but the models do not have to be perfect, they just have to be smooth enough (tuned, bias corrected) to be reliable. The smoothing is done on purpose to account for the fact that the discretized equations can not resolve more than 5-7 times the grid spacing. It is also done because the models have little skill below 10-14 times the grid spacing. As has been explained to me, this is approximately the scale at which the forecasts become statistically reliable. An example forecast of a 10 percent probability, in the reliable sense, will verify 10 percent of the time.

This makes competing with the model tough unless we have skill at deriving not only similar probabilities, but placing those probabilities in close proximity in space-time relative to observations. Re-wording this statement: Draw the radar at forecast hour X probabilistically. If you draw those probabilities to cover a large area you wont necessarily verify. But if you know the number of storms, their intensity, their longevity, and place them close to what was observed you can verify as well as the models. Which means, humans can be just as wrong but still verify their forecast well.

Let us think through drawing the radar. This is exactly what we are trying to do, in a limited sense, in the HWT for the Convection Initiation and Severe Storms Desks over 3 hour periods. The trick is the 3 hour period over which the models and forecasters can effectively smooth their forecasts. We isolate the areas of interest, and try to use the best forecast guidance to come up with a mental model of what is possible and probable. We try to add detail to that area by increasing the probabilities in some areas and removing some for other areas.  But we still feel we are ignoring certain details. In CI, we feel like we should be trying to capture episodes. An episode is where CI occurs in close proximity to other CI in a certain time frame presumable because of a similar physical mechanism.

By doing this we are essentially trying to provide context and perspective but also a sense of understanding and anticipation. By knowing the mechanism we hope to either look for that mechanism or symptoms of that mechanism in observations in the hopes of anticipating CI. We also hope to be able to identify failure modes.

In speaking with forecasters for the last few weeks, there is a general feeling that it is very difficult to both accept and reject the model guidance. The models don’t have to be perfect in individual fields (correct values or low RMS error) but rather just need to be relatively correct (errors can cancel). How can we realistically predict model success or model failure? Can we predict when forecasters will get this assessment incorrect?

Timing

It is remarkably difficult to predict convection initiation. It appears we can predict, most times (see yesterdays post for a failure), the area under consideration. We have attempted to pick the time period, in 3 hour windows, and have been met with some interesting successes and failures. Today had 2 such examples.

We predicted a time window from 16-19 UTC along the North Carolina/South Carolina/ Tennessee area for terrain induced convection and along the sea breeze front. The terrain induced storms went up around 18 UTC, nearly 2 hours after the model was generating storms. The sea breeze did not initiate storms, but further inland in central South Carolina there was one lone storm.

The other area was in South Dakota/North Dakota/Nebraska for storms long the cold front and dryline. We picked a window between 21-00 UTC. It appears storms initiated right around 00 UTC in South Dakota but little activity in North Dakota as the dryline surged into our risk area.  Again the suite of models had suggested quick initiation starting in the 21-22 UTC time frame, including the update models.

In both cases we could isolate the areas reasonably well. We even understood the mechanisms by which convection would initiate, including the dryline, the transition zone, and the where the edge of the deeper moisture resided in the Dakotas. For the Carolinas we knew the terrain would be a favored location for elevated heating in the moist air mass along a weak, old frontal zone. We knew the sea breeze could be weak in terms of convergence, and we knew that only a few storms would  potentially develop. What we could not adequately do, was predict the timing of the lid removal associated with the forcing mechanisms.

It is often observed in soundings that the lid is removed via surface heating and moistening, via cooling aloft, or both processes. It is also reasonable to suspect that low level lifting could be aiding in cooling aloft (as opposed to cold advection). Without observations along such boundaries it is difficult to know what exactly is happening along them, or even to infer that our models correctly depict the process by which the lid is overcome. We have been looking at the ensemble of physics members which vary the boundary layer scheme, but today was the first day we attempted to use them in the forecast process.

It was successful in terms of incorporating them, but as far as achieving understanding, that will have to come later. It is clear that understanding the various structures we see, and relating them to the times of storm initiations will be a worthwhile effort. Whether this will be helpful to forecasting, even in hindsight, is still unknown.

When too much is not enough

Going into HWT today, I was thinking about and hoping for a straightforward (e.g. easy) forecast for storms. I was hoping for one clean slate area. An area where previous storms would not be an issue, where storms would take their time forming, and where the storms that do form would be at least partially predicted by the suite of model guidance we have at our disposal. Last time I think that.

The issues for today were not particularly difficult, just complex. The ensemble that we work with was doing its job, but the relatively weak forcing for ascent in an unstable environment was leading to convection initiation early and often. The resulting convection produced outflow boundaries that triggered more convection. This area of convection was across NM, CO, KS, and NE. It became difficult to rely on these forecasts because of all of this convection in order to make subsequent forecasts of what might occur this evening in NE/SD/IA along the presumed location of a warm front.

We ended up trying to sum up not only the persistent signals from the ensemble, but also every single deterministic model we could get our hands on. We even used all the 12 UTC NAM, 15 UTC SREF, RUC, HRRR, NASA WRF, NSSL WRF, NCAR WRF, etc. We could find significant differences with observations from all of these forecast models (not exactly a rare occurrence) which justified putting little weight on the details and attempting to figure out, via pattern recognition, what could happen. We were not very confident in the end, knowing that no matter what we forecast or when, we were destined to bust.

Ensemble wise, they did their job in providing spread, but it was still somehow not enough. Perhaps it was not the right kind or the right amount of spread. We will find out tomorrow how well (or poorly) we did on this quite challenging forecast. In the end though, we had so much data to digest and process, that the information we were trying to extract became muddied. Without clear signals from the ensemble, how does a forecaster extract the information and process that into a scenario? Furthermore, how can the forecaster apply that scenario to the current observations to assess if that scenario is plausible?

I will leave you with the current radar and ask quite simply: What will the radar look like in 3 hours?

displayN0R

UPDATE: Here is what the radar looked like 3 hours later:

Nothing like our forecast for new storms. But that is the challenge when you are making forecasts like these.

Tags:

This week in CI

The week was another potpourri of convection initiation challenges ranging from evening convection in WY/SD/ND/NE to afternoon in PA/NY back over to OK/TX/KS for a few days. We encountered many similar events as we had the previous week struggling with timing of the onset of convection. But we consistently can place good categorical outlooks over the region, and have consistently anticipated the correct location of first storms. I think the current perception is that we identify the mechanisms and thus the episodes of convection, but timing the features remains a big challenge. The models tend to not be consistent (at least in the aggregate) for at least two reasons: There is no weather event that is identical to any other, and the process by which CI occurs can vary considerably.

The processes that can lead to CI were discussed on Friday and include:
1. a sufficient lifting mechanism (e.g. a boundary),
2. sufficient instability in the column (e.g. CAPE),
3. instability that can be quickly realized (e.g. low level CAPE or weak CIN or low LCL or small LFC relative to the LCL),
4. a deep moist layer (e.g. reduced dry air entrainment),
5. a weakening cap (e.g. cooling aloft).

That is quite a few ingredients to consider quickly. Any errors in the models then can be amplified to either promote or hinder CI. In the last 2 weeks, we had at least similar simulations along the dryline in OK/TX where the models produced storms where none were observed. Only a few storms were produced by the model that were longer lasting, but the model also produced what we have called CI failure: where storms initiate but do not last very long. Using this information we can quickly assess that it was difficult for the model to produce storms in the aggregate. How we use this information remains a challenge, because storms were produced. It is quite difficult to verify the processes we are seeing in the model and thus either develop confidence in them or determine that the model is just prolific in developing some of these features.

What is becoming quite clear, is that we need far more output fields to adequately scrutinize the models. However, given the self imposed time constraints, we need a data visualization system that can handle lots of variables, perform calculations on the fly, and deal with many ensemble members. We have been introduced to the ALPS system from GSD and it seems to be up to the challenge for the rapid visualization and the unique display capabilities for which it was designed (e.g. large ensembles).

We also saw more of what the DTC is offering in terms of traditional verification, object based verification, and neighborhood object based verification. There is just so much to look at it, that it is overwhelming day to day. I hope to look through this in the post experiment analysis in great detail. There is alot of information buried in that data that is very useful (e.g. day to day) and will be useful (e.g. aggregate statistics). This is truly a good component of the experiment, but there is much work to be done to make it immediately relevant to forecasting, even though the traditional impact is post experiment. Helping every component fill an immediate niche is always a challenge. And that is what experiments are for: identifying challenges and finding creative ways to help forecasting efforts.

Tornado Outbreak

I am posting late this week. It has been a wild ride in the HWT. The convection initiation desk has been active and Tuesday was no exception. The threat for a tornado outbreak was clear. The questions we faced for forecasting the initiation of storms were:
1. What time would the first storms form?
2. Where would they be?
3. How many episodes would there be?

This last question requires a little explanation. We always struggle with the criteria that denotes convection initiation. Likewise we struggle with how to define the multiple areas and multiple times at which deep moist convection initiates. This type of problem is “eliminated” when you issue a product for a long enough time period. Take the convective outlook for example. Since the risk is defined for the entire convective day you can account for the uncertainty in time by drawing a larger risk area and subsequently refining it. But as you narrow down your time window (from 1 day to 3 hours or even 1 hour) the problems can become significant.

In our case, the issue for the day was compounded because the dryline placement in the models was significantly east of the observed position by the time we started making our forecast. We attempted to account for this fact and as such had to adopt to a feature relative perspective of CI along the dryline. However, the mental picture you are assembling of the CI process (location, timing, number of episodes, number of storms) is tied not just to the boundaries you are considering, but the presumed environment in which they will form.

The feature relative environment then would necessarily be in error because we simply do not have enough observations to account for the model error. We did realize that shallow moisture, which was shown on morning soundings, was not going to be the environment in which our storms formed. Surface dew points were higher and staying near 68 in the warm sector. We later confirmed this with soundings at LMN which showed the moist layer increase in depth with time.

So we knew we had two areas of initial storm formation, one in the panhandle of OK and into KS along the cold front to the west and triple point to the east. The other area was along the dryline in OK and TX. We had to decide how far south storms would initiate. As we figuring all of this out, we had to look at the current satellite imagery since that was the only tool which was accounting for the correct dryline placement and estimate how far east it might travel, or mix out to in order to make the forecast.

Sure enough, the warm sector had multiple cloud streets ahead of the dryline. Our 4km model suite is not really capable of resolving cloud streets but we still needed to make our forecast roughly 1-2 hours before CI. So in a sense we were not making a forecast as much as we were making a longer more uncertain nowcast (probably not abnormal given the inherent unpredictability of warm season convection). Most people put the first storm in KS and would end up being quite accurate in placement. Some of us went ahead of the dryline in west central OK and were also correct.

There was one more episode in southern OK and then another in TX later on. This case will require some careful analysis to verify the forecast, other than subjective assessments. Today we got to see some of the potential objective methods via DTC, showing MODE plots of this case. The object identification of reflectivity via neighborhood and also merging and matching were quite interesting and should foster vigorous discussion.

Last but not least, the number of models we interrogated continued to increase, yet we were feeling confident in understanding this wide variety of models using all of the visualization tools including the more rapid web-based plots, and the use of the sub-hourly convectively active fields. We are getting quite good at distilling information from this very large dataset. There are so many opportunities for quantifying model skill that we will be busy for a long time.

It was interesting to be under the threat of tornadoes and to be in the forecast path of them. It was quite a day, especially since the remnant of the hook echo moved over Norman showering debris over the area picked up from the Goldsby Tornado. The NWC was roughly 3-5 miles away from the dissipation point of that Tornado.

Quick Post

I have blogged here about scales of CI but this weekend was a great example.
Saturday:

tlx

These storms formed in close proximity to the dryline where the southern most supercell went up pretty quickly and the other to the North and West went up much slower, remained small and then only the closest storm to the supercell formed into one. But the contrast is obvious. Even after breaking the cap, the storms remained small for an hour or so, and a few remained small for 2.

Today, we saw turkey towers along the dryline for quite a while (2 hours-ish) in OK and then everything went up. But it is interesting to see the different scales, even at the “cloud scale” where things tend be uneven and random, skinny and wide, slow and fast. It makes you wonder what the atmospheric structure is, especially when our tools tell us the atmosphere is uncapped, but the storms just don’t explode.

Looks like a pretty active southern Plains week is just beginning, as evidenced by the 43 tornado reports today and the 20 yesterday.

Tags:

Relative skill

During the Thursday CI forecast we decided to forecast for a late show down in western Texas where the models were indicating storms would develop. The models had a complex evolution of the dryline and moisture return from SW OK all the way down past Midland, TX. There was a dryline that would be slowly moving southeast and what we thought could be either a moisture return surge or bore of some kind moving northwest. CI was being indicated by most models with large spatial spread (from Childress down to Midland) and some timing spread (from 03 to 07 UTC) depending on the model. More on this later.

Mind you, none of these models had correctly predicted the evolution of convection earlier in the day along the TX panhandle-western OK border. In fact the dryline in the model was well into OK and snaked back southwest and remained there until after 23 UTC. The dryline actually made it to the border area, but then retreated after the storms formed southwest of SW OK in TX. This was probably because of the outflow from these storms propagating westward. This signal was not at all apparent in the models, maybe because of the dryline position. The storms that did form had some very distinct behavior with storms that formed to the north side of the 1st initiation episode moving North, not east like in the models. The southern storms were big HP supercells, slowly moving east northeast, and continually developing in SW OK and points further SW into TX (though only really the first few storms were big; the others were small in close proximity to the big storms – a scale enigma). We had highlighted the areas to the south in our morning forecast, along with an area in KS to the North but left a sight risk of CI in between. So while our distinct moderate risk areas would sort of verify in principle (being two counties further to the east than observed) we still did not have the overall scenario correct.

That scenario being, storms developing in our region and moving away, with the possibility for secondary development along the dryline a bit later. Furthermore we expected the storms to the north to develop in our moderate risk area and move east. When in fact the OK storms moved into our KS region just prior to our northern KS moderate area verifying well with an unanticipated arcing line of convection. This was a sequence of events that we simply could not have anticipated. We have discussed many times having the need to “draw what the radar will look like in 3 hours”. This was one of those days where we could not have had any skill whatsoever in accomplishing that task.

Drawing the radar in 1 or 2 or 3 hours is exactly what we have avoided doing given our 3 hour forecast product. We and the models, simply do not have that kind of skill at the scales where it will be required to have added value. This is not so much a model failure or even human failure. It is an operational reality that we simply don’t have enough time to efficiently and quickly mine the model data to extract enough information to make a forecast product. More on this later.

Back to the overnight convection. So once these SW OK supercells had been established we sought other model guidance, notably the HRRR from 16 or 17 UTC. By then it had picked upon the current signal and was showing a similar enough to observations evolution. This forecast would end up being the closest solution, but to be honest was still not that different way down in TX to the ensemble which was a 24-30 hour forecast. They all said the same thing: dryline boundary and moisture surge would collide, CI would ensue within 2 hours into a big line of convective storms that would last all night and make Fridays forecast very difficult.

Sure enough these boundary collisions did happen. From the surface stations point of view, winds in the dry air had been blowing SW all day with temps in the upper 80’s low 90s. After 02 UTC, the winds to the NW had backed and were now blowing W with some blowing W-NW. While at the dryline, winds were still SW but weakening. Ahead of the dryline, they were SE and weak. By 0400 UTC, the moisture surge intensified from weak SE winds to strong SE winds, with the dew point at CDS increasing from 34 to 63 in that hour. On radar the boundaries could be seen down in Midland, as very distinct with a clear separation:

maf

You can see CI already ongoing to the north where the boundaries have already collided and the zipper effect was in progress further southwest but it took nearly 2 more hours.

maf2

Also note the “gravity waves” that formed in the upper level frontal zone within a region of 100 knots of vertical shear back in NM. Quite a spectacular event. Let me also note that the 00 UTC ensemble and other models DID NOT pick up this event, until 3 hours later than shown by the last radar image. Spin up may have played a significant role in this part of the forecast. As you can see, the issues we face are impressive on a number of levels, spatial, and temporal scales. We verified our forecast of this event with the help of the ensemble and the HRRR and the NSSL WRF. To reiterate the point of the previous post: It is difficult to know when to trust the models. But in this case we put our faith in the models and it worked out, whereas in the previous forecast, we put our faith in the models and we had some relative skill, but not enough to add value.