Not so fast

What can I say. We verified our Tuesday forecasts and felt pretty good capturing the elevated convection over KS and OK at about the right time (1st lightning strike was 45 minutes before our period start time) for an elevated convection event and we rocked it out in CO during the day.

Which brings us to Wednesday. A slight risk in OK with a triple point from a warm front, stationary front and dryline in place. The dryline was forecast to be nice and strong, with horizontal convective rolls (HCRs) present on the north side interacting with the dryline circulation, and more HCRs in the dry air behind the dry line further south. The end result was a series of CI failures (indicated exclusively by accumulated precipitation) along the dryline, but eventually classic right moving supercells (1 or 2 dominant and long lived) were present in the ensemble. The HCRs were detected via some of the unique output variables we are experimenting with, particularly vertical velocity at 1.1 km AGL.

The complicating factor was that not all members had supercells, perhaps 1 in 3. We also had a situation where CI was very sensitive to the distribution of mixing ratio in the warm sector. It appeared we had two different behaviors along the dryline southward, but a definite moisture pool was dominant to the north. This pool was in the area where the dryline bent back westward and where the HCRs directly interacted with the dryline/warm front boundary. There was very little vertical velocity in the warm sector. Not sure why this was the case, assuming the PBL heights were not much lower than 1.1 km.

But lets be serious. There was not a whole lot of over-forecasting by the models. Storms did attempt to form. They just didn’t last very long or get very strong. Nor could we really call them storms (having met no single definition of CI other than some coherent weak reflectivity). In this case it appears the strongest forcing (the dryline and HCRs) was separated from where the realizable instability was present. We analyzed where, when, and how the instability could be realized (in great detail) in the model. We could not verify these models with observations because we don’t have very many sounding sites nor do we have frequent launches.  What we could not do is pinpoint where this forecast was going wrong.

The KOUN sounding is presented below:

OUN

Note that this sounding has nearly 3000 J/kg CAPE with 43 knot deep layer shear, and strong (31 knot) 0-1km shear. An ideal sounding for supercells, and possibly tornadoes. But Norman is well away from where the dryline was setup in western OK. There are no sounding sites in SW OK or NW OK. If we look at LMN which is north of OUN and happens to be north of the warm front:

lmn

We see very little instability because of cooler surface temperature and dew point and an elevated capped layer. Modifying this sounding for surface conditions at OUN would indicate strong instability and no cap using the virtual parcel. It is unlikely to be this easy as there is probably some mesoscale variability that has not been sampled in this area.

What is clear is that the forecasters were very much willing to buy into a reasonable solution. What we lacked was a solid reason to not believe the models. I am assuming we should first believe the models and that is perhaps not the best starting point. So lets reverse that thought: what reasons did we use to believe the models? I won’t speak for the group, but we should address this question when we review this event tomorrow.

Another point: Assume the model is not a poor representation of observations. What if it was very close? How could we recognize the potentially small errors which could lead to the development of storms or the lack there of? These are really fundamental questions that still need to be addressed.

On a day like today it would have been highly valuable to have soundings in key storm-scale locations near the dryline in the warm sector to the east, and to the immediate north between the dryline and warm front. At the very least we can ascertain if the model was depicting the stratification correctly and also the moisture content, instability and inhibition. It would have been great to have a fine scale radar to measure the winds around the dryline to look for HCRs.

Opportunities

Every Monday we get a new group of participants and this week we spent time discussing all the issues from their perspective. From an aviation, airport, and airplane perspective to towering cumulus from the mountains. We then discussed the forecast implications from verification, to model data mining, to practical use of forecast data and again learned the lesson that forecasters already know: you only have so much time before your forecast is due and it better contain the answer!

Our forecast has evolved into a 3 hour categorical product of convection initiation where we determine the domain and the time window. We then go a step further and forecast individually a location where we think the first storm will be in our domain during our time period. We then forecast the time we think it will occur and our uncertainty. Then we assign our confidence that CI will occur within 25 miles of our point. It might sound easy, but it takes some serious practice to spin up 10 people on observations, current and experimental guidance, AND lots of discussion about the scenario or scenarios at play. We have a pretty easy going group, but we do engage in negotiations about where to draw our consensus categorical risk and over what time period we are choosing. It is a great experience to hear everyone’s interpretation of the uncertainty on any number of factors at play.

Monday was a great practice day with a hurried morning forecast, and terrain induced CI forecast in the afternoon. Tuesday we were off to a good start with all products nominal, 10+ forecasters, and action back out on the Plains with plenty of uncertainty. The highlights from today included the introduction of ensemble soundings via the BUFKIT display (Thanks Patrick Marsh!). This garnered a lot of attention and will be yet another new, exciting, and valuable visualization tool. The aviation forecasters shared tremendous insights about their experiences and even showed a movie of what they face as airplane traffic gets shuffled around thunderstorms. It was a glimpse of exactly the sorts of problems we hope to address with these experimental models.

These problems are all associated with the CI problem, on every scale (cloud, storm, squall line scales). The movie highlighted the issue of where and when new convection would fill in the gaps, or simple occupy more available air space, or block and airport arrival, or when convection would begin to fizzle. Addressing these issues is part of the challenge and developing guidance relies almost exclusively on how we define convection initiation in the models and observations. We have some great guidance and it is clear that as we address more of the challenges of generic CI we will require even more guidance to account for the sheer number of possibilities of where (dx, dy and dz), when, how, and if CI occurs.

As an example, we issued our first night time elevated convection forecast. As it turns out, we could be verifying this by observing rain in OK tonight. Our experimental guidance was inadequate as we have very little data aloft except soundings from the fine resolution models. So we looked at more regular models while using what was available from the fine resolution models, like reflectivity and CI points. This highlights a unique operational challenge that we all face: Data overload and time intensive information extraction. The forecast verification for tonight should be quite revealing and should provide more insight than I am prepared to discuss this evening.

When seeing is believing

The HWT is examining some fairly sophisticated model simulations over a big domain. One question that frequently arises is: Can we trust the model over here if it is wrong over there?

What does “wrong” mean in these somewhat new models? Wrong in the sense that convection is absent or wrong in the sense that convection is too widespread? Perhaps, a particular feature is moving too slow or too fast. Can you really throw out the whole simulation if a part is “wrong”? Or do you just need time to figure out what is good/bad and extract what you can? Afterall the model is capable of detail that is not available anywhere else. That includes observations.

So Thursday and Friday we discussed how wrong the models have been. The features missed, the features misrepresented, the features absent. Yet each day we were able to extract important information. We were careful about what we should believe. On Friday, though, it was a different story. The NSSL WRF simulated satellite imagery was spot on. That is 14 hours into the simulation where the upper low, its attendant surface cold front were almost identical.

Our domain was northern AR, southern MO, western TN and MS. The models were not in agreement mind you. The different boundary layer schemes clustered into two groups: all the schemes were going for the northern AR initiation, and a second group, the TKE based schemes were also going for the southern part of the cold front. Another signal I was paying attention was post-frontal convergence that was showing up. I made note of it but I never went back to check all the simulations but I wanted to keep that threat in the forecast. Turns out, the TKE schemes hit on all of these features. The northern storms initiated similar to model consensus, the southern storms initiated as well, and so did the secondary episode behind the front (at least from the radar perspective). 

The second domain of the day was Savannah GA, in the afternoon. This was an event involving convection possibly moving in from the west, the sea breeze front penetrating far inland along the east, a sea breeze fron the west FL and gulf coast sea breeze penetrating even farther inland, and a highly organized boundary layer sandwiched in between. The models had little in the way of 30 dBz 1km reflectivity at hourly intervals. The new CI algorithms showed that CI was occurring along all of the aforementioned features:
1. Along the sea breezes,
2. in the boundary layer along horizontal convective rolls,
3. along the intersections of 1 and 2,
4. and finally along the outflow entering into our domain.

We went for it and there was much rejoicing. We watched all afternoon as those storms developed along radar fine lines, and along the sea breeze. This was a victory for the models. These storms ended up reaching severe levels as a few reports came in.

As far as adding value on days like this, I am less certain. Our value was in extracting information. There is much to add value to. At this stage, we are still learning. It is impossible to draw what the radar will look like in 3 hours (unless there is nothing there). But I think as we assemble the capabilities of these models, we will be able to visualize what the radar might look like. As our group discussed, convection in the atmosphere appears random. But only because we have never seen the underlying organization.

It is elusive because our observing systems do not see uniformly. We see vertical profiles, time series at a location, and snap shots of clouds. We see wind velocity coming towards or away from radars. We see bugs caught in convergence lines (radar fine lines). So these models provide a new means to see. Maybe we see things we know are there. Maybe we are seeing new things that we don’t even know to look for. Since we can not explain them we are not looking for them. We expect to see more cool stuff this week.

Thanks to all the forecasters this week who both endured us trying to figure out our practical, time limited forecast product, and who taught us how to interrogate their unique tools and visualizations. We begin anew tomorrow with a whole new crop of people, a little better organized, with more new stuff on display, and more complex forecasts to issue.

I know it when I see it and other discussions

We had many discussions over the last two days. One was regarding CI definitions. Of the variety of opinions we heard, a storm was defined by:
1. Whether lightning occurred,
2. a coherent, continuous thunderstorm that eventually reached a significant low level reflectivity threshold (40-45 dBz) within 30 minutes,
3. any combination of 1 or 2, which also produced severe weather (e.g. it was just a storm, but a severe storm)

These variations on the theme are exactly what we were considering for the experiment, the experimental algorithms from the model, and the forecast verification we had played with prior to the experiment.

Another conversation involved what forecasters would use from the experimental suite of variables. The variables would need to be robust, easy to interpret (e.g. quick to interpret and understand), and clear. This is a tough sell from the research side of things, but it is totally understandable from a forecaster perspective. Forecasters have limited time in an environment of data overload in which to extract (or mine) information from the various models. They have very specific goals too, from nowcasting (e.g. 1-2 hours; especially on days like today where models miss a significant component of what is currently happening), to forecasting (6-24 hours), to long lead forecasting 1-8 days.

We also spent some time discussing the Tuesday short wave trough in terms of satellite data and radiosonde data.  I argued that many people believe that satellite data and its assimilation is much more important now (Data volume, coverage, and quality control) than radiosondes. It was mentioned that radiosondes are very important on the mesoscale especially in the 0-24 hour possibly 48 hour forecasts. Still more opinions were expressed that some forecasters have questioned the need for twice a day soundings. Opinions in the HWT ranged from soundings are important, to soundings should be launched more often, to sounding should be launched more often at different times. It is plausible that some of our NWP difficulty may be due to launching soundings at transition times of the boundary layer.

I am of the opinion that if model suites are launched 4 times per day that soundings should be launched at least 4 times per day, especially now where cycling data assimilation is common practice. This would return our field to the 1950’s era where 4 times per day soundings were launched at 3,9, 15, and 21 UTC.

Lastly, we discussed the issue of what happens when a portion of the forecast domain is totally out to lunch? Like today where the NM convection was not represented. I think I will talk about that tomorrow once we verify our forecast in the OK area from today. Stay Tuned.

It’s complicated

As expected, it was quite a challenge to pick domains for days 2 and 3. Day 2 was characterized by 3 potential areas of CI: Ohio to South Carolina, Minnesota and Iowa, and Texas. We were trying to determine how to deal with pre-existing convection: whether it was in our domain already or would be in our domain during our assumed CI time. As a result, we determined that the Ohio to South Carolina domain was not going to be as clean-slate as Texas or Minnesota. So we voted out SC.

We were left with Texas (presumed dryline CI) and Minnesota (presumed warm front/occlusion zone). Texas was voted in first but we ended up making the MN forecast in the afternoon. Data for this day did not flow freely, so we used whatever was available (NSSL-WRF, operational models, etc).

The complication for TX was an un-initialized short wave trough emanating from the subtropical jet across Mexico and moving northward. This feature was contributing to a north to south band of precipitation  and eventually triggered a storm in central and eastern OK, well to the east of our domain. The NSSL WRF did not produce the short wave trough and thus evolved eastern TX much differently than what actually occurred despite having the subtropical jet in that area.  So we were gutsy in picking this domain despite this short wave passing through our area. We were still thinking that the dryline could fire later on but once we completed our spatial confidence forecast (a bunch of 30 percents and one 10 percent) and our timing confidence (~+/- 90 minutes) it was apparent we were not very confident.

This was an acceptable challenge as we slowly began to assemble our spatial forecast, settling on a 3 hour period in which we restrict ourselves to worrying only about new, fresh convection by spatially identifying regions within our domain where convection is already present. This way we don’t have to worry about secondary convection directly related to pre-existing convection. We also decided that every forecaster would enter a spot on the map where they thought the first storms would develop (within 25 miles of their point). This makes the forecast fun and competitive and gets everyone thinking not just about a general forecast but about the scenario (or scenarios if there are multiple in the domain).

The next stop on this days adventure was MN/IA/Dakotas. This was challenging for multiple reasons:
1. The short wave trough moving north into OK/KS and its associated short wave ridge moving north northeast
2. the dryline and cold front to the west of MN/IA,
3. the cold upper low in the Dakotas moving east north east.

The focus was clear and the domain was to be RWF. This time we used a bigger domain in acknowledgement of the complex scenario that could unfold. You had the model initiating convection along the warm front, along the cold front in NE on a secondary moisture surge associated with the short wave trough, and a persistent signal of CI over Lake Superior (which we ignored).

We ended up drawing a rather large slight risk extending down into IA and NE from the main lobe in MN with a moderate area extending from south central MN into northern IA. After viewing multiple new products including simulated satellite imagery (water vapor and band differencing from the NSSL WRF and the Nearcast moisture and equivalent potential temperature difference, it was decided that CI was probably with everyone going above 50 percent confidence.

In Minnesota we did quite well, both by showing a gap near Omaha where the moist surge was expected but did not materialize until after our 0-3 UTC time period. Once the moisture arrived … CI. In MN CI began just prior to 23 UTC encompassing some of our moderate risk even down into IA, yet these “Storms” in IA were part of the CI episode but would not be objectively classified as storms from a reflectivity and lifetime perspective, but they did produce lightning.

The verification for Texas was quite bad. Convection formed to the east early, and to the west much later than anticipated associated with a southern moisture surge into NM from the upper level low migrating into the area nearly 11 hours after our forecast period start.

As it turns out, we awoke this morning to a moderate risk area in OK, but the NM convection was totally missed by the majority of model guidance! The dryline was in Texas still but now this convection was moving toward our CDS centerpoint and we hoped that the convection would move east. A review of the ensemble indicated some members had some weak signals of this convection, but it became obvious that it was not the same. We did key in on the fact that despite the missed convection in the TX panhandle the models were persistent in secondary initiation despite the now-developing convection in southern TX. We outlooked the area around western OK and parts of TX.

In the afternoon, we looked in more detail at the simulated satellite imagery, nearcast, and the CIRA CI algorithm for an area in and around Indiana. This was by far the most complicated and intellectually stimulating area. We analyzed the ensemble control member for some new variables that we output near the boundary layer top (1.2 km AGL roughly): WDT: the number of time steps in the last hour where w exceeded 0.25 m/s and convergence . We could see some obvious boundaries as observed, with a unique perspective on warm sector open celled convection.

In addition we used the 3 hour probabilities of CI that have been developed specifically for CI since these match our chosen 3 hour time periods. We have noticed significant areal coverage from the ensemble probabilities which heavily weight the pre-existing convection CI points. Thus it has been difficult to assign the actual new CI probabilities since we cant distinguish the probability fields if two close proximity CI events are in the area around where we wish to forecast. That being said, we have found them useful in these messy situations. We await a clean day to see how much a difference that makes.

Day 1 in the 2011 HWT EFP

What a great start to the HWT. There were troubles, and troubleshooters. We had plenty of forecasters and plenty of forecast problems. All in all it was quite a challenge.

The convection initiation (CI) team had some great discussion on the CI definition including all the ways in which CI gets complicated. For example, visually we can identify individual clouds, or cloud areas on satellite. When using radar, we might select areas of high reflectivity that last for say 30 minutes. In the NWP models, we rely on quantitative values at a single grid point at two instances in time.

We also have the issue of whether CI is part of a larger episode (close in space and/or time by other storms) or developing as a direct result of previous convection (ahead of a squall line). In these relative cases, visually identifying new storms might be easily accomplished, but in the model atmosphere (in a grid point centric algorithm) new CI points may be all over the place, say as gravity waves or outflow achieve just enough separation to be classified as new (thus CI) even though it might simple be redevelopment. From a probability standpoint, spatial probabilities of CI may thus be larger around existing convection. Does this enhanced probability, ahead of the line, signal actual new storm development?

Trying to establish an apples to apples comparison between model and human forecasts of such discrete events is a major challenge. We are testing 3 model definitions of CI to see their viability from the perspective of forecasters, and we will also evaluate object based approaches to CI.

Of course we cannot talk about where CI might be without talking about when! When will the first storm form? This gets back to your definition of CI. Should the storm produce lightning to be classified a storm? How about reaching a threshold reflectivity? How about requiring it that it last a certain amount of time? The standard definition of storms relies on its mode (ordinary, multicell, supercell); all having a unique evolution with the placement of the updraft and precipitation fall out. But what about storm intensity (however you define it)?

I should also acknowledge that defining all of this can be quite subjective and is relevant to individual users of a CI forecast. So we are definition dependent, but most people know it when they see it. Lets consider two viewpoints: The severe storm forecaster and an aviation forecaster. The severe storm forecaster wants to know about where and when a storm may form so they can decide the potential threat thus leading to a product (mesoscale discussion for specific hail, wind, tornado threats) provided that storm or CI episode is long lived. The aviation forecaster might be concerned with the sudden appearance of cumulonimbus which could pose an immediate threat to aircraft. But they are also concerned with the resulting coverage of new storms (diverting traffic, shutting down airports, planning new traffic routes or patterns) and the motion, expansion, and decay of existing storms.

And lastly it will be important for us to establish what skill the models and forecasters have with respect to CI. This is not a new area of study, but it is one where lots of complexity, vagaries of definitions, and also a lack of understanding contribute to making this one of the greatest forecast challenges.

As we refine what our forecast will consist of, we will report back on how our forecast product evolved. The more we forecast, the more we learn.