Short-Term Forecasting Methods

This year, the Spring Forecasting Experiment is focusing on the Day 1 time period more than ever before, eschewing the long lead-time forecasts that we have made in previous years in favor of honing in on timing information and allowing participants to delve into the data. Since more data than ever before is available within the drawing tool where participants draw their forecasts, we’re excited to see how participants probe the new data within the various ensemble subsets.

One short-term experimental forecast product being generated on the Innovation Desk this year are Potential Severe Timing (PST) areas, which indicate which 4-hr period severe weather will occur in over the general area of 15% probability of severe. By identifying the timing of the severe event and displaying all of the timing contours on one graphic, the end product is hoped to be valuable for emergency managers and broadcasters for their advance planning. Small groups of participants generate these forecasts around subsets of the CLUE and HREFv2 ensembles, meaning that on any given day we’ll ideally have 5 separate sets of PSTs. After the participants separate into their small groups and issue their forecasts, we ask them to come back together and brief one another on what their particular ensemble subset was doing. This way, each group of participants can delve into the data from their subset more deeply than if the activity were to take place as one large group. This briefing period also allows the participants to be exposed to different lines of reasoning in issuing their forecasts, and has thus far sparked several good discussions.

Here are the PSTs from 3 May 2017, or Thursday of last week:The different ensemble subset groups compose the top row and the left and middle section of the bottom row, while the bottom right hand panel shows the forecast from the expert forecaster facilitator on the Innovation Desk. Several different strategies are evident within the panels, including some groups that chose not to indicate timing areas for all of the 15% area of our full-period outlook (shown below).

 The reasoning from the groups for their different areas gave insight into the model performance as well as the different forecasting strategies employed by the different groups of people. The group using the HREFv2 decided not to use the NMMB member when generating their forecasts, because the depiction of morning convection was so poor. The HRRRE group had very large areas, which they attribute to the large spread within the HRRRE. The NCAR group decided to discount the guidance in the north of the domain, because of erroneous convection in the northern domain. Instead, they felt more confident in the southern areas where the ensemble was producing supercells. Their group thought that the thermodynamics of the northern area was less conducive to supercellular convection. The group using the mixed physics ensemble from CAPS placed their first area based on where they thought convective initiation would occur, indicating that they thought convection would quickly become severe. Their southern PST was very late to cover any severe threat overnight, but they considered that it might be more of a flood threat (which we do not forecast for in the Spring Forecasting Experiment). The stochastic physics group (another ensemble run by CAPS), on the other hand, had an ensemble which showed almost no signal in the southern area of interest. It also showed a later signal than the other ensembles, contributing to the spread in the time of the first PST.

All of these details came out during the discussion of the PSTs, after participants dove into the data from their subensemble. How did the PSTs do? Here’s a snapshot of the PSTs with reports from 18-22 UTC overlaid:Ideally, all of the reports would fall into the 18-22 UTC contours, which mostly occurred for the expert forecaster and did occur for the HRRRE and Mixed Physics group, although both groups had large areas of false alarm. Here’s a similar image, but showing reports from 22-02 UTC:At this point in time, all groups missed the activity in Kansas, although some groups captured most of the reports within a 22-02 UTC window.

The day after the forecasts, participants are able to go through and give ratings based on the reports that have come in, and choose the group’s forecast that they thought performed the best. Who performed the best for this case? 3 votes for HREFv2, 2 votes each for the HRRRE and the CAPS Stochastic Physics ensemble, and one vote each for the CAPS Mixed Physics and the NCAR ensemble group. Clearly, the complexity of this case provided plenty of nuances to evaluate, and I would bet that more complex cases such as this are on the way….after all, we’ve only just begun Week 2 of the 2018 SFE!

Sneak Peak Part 3: Modeled vs Observed reports

I went ahead and used some educated guesses to develop model proxies for severe storms in the model. But how do those modeled reports compare to observed reports? This question, at least the way it is addressed here, yields an interesting result. Lets go to the figures:

Click for larger

The 2 images show the barchart of all the dates on the left, with the Modeled reports (top), observed reports close to modeled storms (middle) and the natural log of the pixels of each storm (or area; bottom) on the right. The 1st image has the modeled storm reports selected and it should be pretty obvious I have chosen unwisely (either the variable or the value) for my hail proxy (the reports with a 2 in the string). Interestingly, the area is skewed to the right or very large objects tend to be associated with model storms.

Also note that modeled severe storms are largest in the ensemble for 24 May with 27 Apr coming in 6th.  24 May appears first in percent of storms on that date with the 27 Apr outbreak coming in 15th place (i.e. having a lot of storms that are not severe).

Snapshot 2011-12-20 21-26-07
Click for larger

Changing our perspective and highlighting the observed reports that are close to modeled storms, the storm area distribution switches to the left or smallest storm area.

The modeled storms to verify has 25 May followed by 27 Apr coming in with the most observed reports close by. 24 May lags behind in 5th place. In a relative sense, 27 Apr and 25 May switch places, with 24 May coming in 9th place.

These unique perspectives highlight two subtle but interesting points:
1. Modeled severe storms are more typically larger (i.e. well resolved),
2. Observed reports are more typically associated with smaller storms.

I believe there are a few factors at play here including the volume and spacing of reports on any particular day, and of course how well the model performs. 25 May and 27 Apr had lots of reports so they stand out. Plus all the issues associated with reports in general (timing and location uncertainty). But I think one thing also at work here is that these models have difficulty maintaining storms in the warm sector and tend to produce small, short-lived storms. This is relatively bad news for skill; but perhaps a decent clue for forecasters. I say clue because we really need a larger sample across a lot of different convective modes to make any firm conclusions.

I should address the hail issue noted above. I arbitrarily selected an integrated hail mixing ratio of 30 as the proxy for severe. I chose this value after checking out the 3 severe variable (hourly max UH > 100 m s-2  for tornadoes, hourly max wind > 25.7 m s-1, hourly max hail > 30) distributions. After highlighting UH at various thresholds it became pretty clear that hail and UH were correlated. So I think we need to look for a better variable so we can relate hail-fall to modeled variables.

Sneak Peak 2: Outbreak comparison

I ran my code over the entire 2011 HWT data set to compare the two outbreaks from 27 April and 24 May amidst all the other days. These outbreaks were not that similar … or were they?


In the first example, I am comparing the model storms that verified via storm reports with 40% for 27 April and only 17% for 24 May but 37% for 25 May. 25 May also had a lot of storm reports including a large number of tornado reports. Note the distribution of UHobj (upper left) is skewed toward lower values. The natural log of the pixel count per object (middle right) is also skewed toward lower values.
[If I further dice up the data set, requiring UHobj exceed 60, then 27 April has ~12%, 24 May has 7.8%, 25 May has 4% of the respective storms on those days (not shown). ]


In the second example, if I only select the UHobj greater than 60, the storm percentages for 27 Apr are 25%, 24 May are 35%, and 25 May are 8%. The natural log of the pixel count per object (middle right) is also skewed toward higher values. Hail and Wind parameters (middle left and bottom left, respectively) shift to higher values as well.

Very interesting interplay exists here since 24 May did not subjectively verify well (too late, not very many supercells). 27 Apr verified well, but had a different convective mode of sorts (linear with embedded supercells). 25 May I honestly cannot recall other than the large number of reports that day.

Comments welcome.

Certainty, doubt, and verification

Today’s forecast on CI focused on the area from northeast KS southwest along a front down towards the TX-OK panhandles. It was straightforward enough. How far southwest will the cap break? Will there be enough moisture in the warm sector near the frontal convergence? Will the dryline serve as a focus for CI, given the development of a dry slot present just ahead of the dryline along the southern extent of the front and a transition zone (reduced moisture zone)?

So we went to work mining the members of the ensemble, scrutinizing the deterministic models for surface moisture evolution, examining the convergence fields, and looking at ensemble soundings. The conclusion from the morning was two moderate risk areas: one in northeast KS and another covering the triple point, dryline, and cold front. The afternoon forecast backed off the dryline-triple point given the observed dry slot and the dry sounding from LMN at 1800 UTC.

The other issue was that the dryline area was so dry and the PBL so deep that convective temperature would be reached but with minimal CAPE (10-50 J kg-1). The dry LMN sounding was assumed to be representative of the larger mesoscale environment. This was wrong, as the 00 UTC sounding at LMN indicated an increase in moisture by 6 g/kg aloft and 3 at the surface.

Another aspect to this case was our scrutiny of the boundary layer and the presence of open-cell convection and horizontal convective rolls. We discussed, again, that at 4km grid spacing we are close to resolving these types of features. We are close because of the scale of the rolls (in order to resolve them they need to be larger than 7times the grid spacing) which scales with the boundary layer depth. So a day like today where the PBL is deep, the rolls should be close to resolvable. On the other hand, there is a need for additional diffusion in light wind conditions and when this does not happen, the scale of the rolls collapses to the scale of the grid. In order to believe the model we must take these considerations into account. In order to discount the model, we are unsure what to look for besides indications of “noise” (e.g. features barely resolved on the grid, scales of the rolls being close to 5 times the grid spacing).

The HCRs were present today as per this image from Wichita:


However, just because HCRs were present does not mean I can prove they were instrumental in CI. So when we saw the forecast today for HCRs along the front, and storms developed subsequently, we had some potential evidence. Given the distance from the radar, it may be difficult if not impossible, to prove that HCRs intersected the front, and contributed to CI.

This brings up another major point: In order to really know what happened today we need a lot of observational data. Major field project data. Not just surface data, but soundings, profilers, and low level radar data. On the scale of The Thunderstorm Project, only for numerical weather prediction. How else can we say with any certainty that the features we were using to make our forecast were present and contributing to CI? This is the scope of data collection we would require for months in order to get a sufficient amount of cases to verify the models (state variables and processes such as HCRs). Truly an expensive undertaking, yet one where a number of people could benefit from one data set and the field of NWP could improve tremendously. And lets not forget about forecasters who could benefit from having better models, better understanding, and better tools to help them.

I will update the blog after we verify this case tomorrow morning.

This week in CI

The week was another potpourri of convection initiation challenges ranging from evening convection in WY/SD/ND/NE to afternoon in PA/NY back over to OK/TX/KS for a few days. We encountered many similar events as we had the previous week struggling with timing of the onset of convection. But we consistently can place good categorical outlooks over the region, and have consistently anticipated the correct location of first storms. I think the current perception is that we identify the mechanisms and thus the episodes of convection, but timing the features remains a big challenge. The models tend to not be consistent (at least in the aggregate) for at least two reasons: There is no weather event that is identical to any other, and the process by which CI occurs can vary considerably.

The processes that can lead to CI were discussed on Friday and include:
1. a sufficient lifting mechanism (e.g. a boundary),
2. sufficient instability in the column (e.g. CAPE),
3. instability that can be quickly realized (e.g. low level CAPE or weak CIN or low LCL or small LFC relative to the LCL),
4. a deep moist layer (e.g. reduced dry air entrainment),
5. a weakening cap (e.g. cooling aloft).

That is quite a few ingredients to consider quickly. Any errors in the models then can be amplified to either promote or hinder CI. In the last 2 weeks, we had at least similar simulations along the dryline in OK/TX where the models produced storms where none were observed. Only a few storms were produced by the model that were longer lasting, but the model also produced what we have called CI failure: where storms initiate but do not last very long. Using this information we can quickly assess that it was difficult for the model to produce storms in the aggregate. How we use this information remains a challenge, because storms were produced. It is quite difficult to verify the processes we are seeing in the model and thus either develop confidence in them or determine that the model is just prolific in developing some of these features.

What is becoming quite clear, is that we need far more output fields to adequately scrutinize the models. However, given the self imposed time constraints, we need a data visualization system that can handle lots of variables, perform calculations on the fly, and deal with many ensemble members. We have been introduced to the ALPS system from GSD and it seems to be up to the challenge for the rapid visualization and the unique display capabilities for which it was designed (e.g. large ensembles).

We also saw more of what the DTC is offering in terms of traditional verification, object based verification, and neighborhood object based verification. There is just so much to look at it, that it is overwhelming day to day. I hope to look through this in the post experiment analysis in great detail. There is alot of information buried in that data that is very useful (e.g. day to day) and will be useful (e.g. aggregate statistics). This is truly a good component of the experiment, but there is much work to be done to make it immediately relevant to forecasting, even though the traditional impact is post experiment. Helping every component fill an immediate niche is always a challenge. And that is what experiments are for: identifying challenges and finding creative ways to help forecasting efforts.

Relative skill

During the Thursday CI forecast we decided to forecast for a late show down in western Texas where the models were indicating storms would develop. The models had a complex evolution of the dryline and moisture return from SW OK all the way down past Midland, TX. There was a dryline that would be slowly moving southeast and what we thought could be either a moisture return surge or bore of some kind moving northwest. CI was being indicated by most models with large spatial spread (from Childress down to Midland) and some timing spread (from 03 to 07 UTC) depending on the model. More on this later.

Mind you, none of these models had correctly predicted the evolution of convection earlier in the day along the TX panhandle-western OK border. In fact the dryline in the model was well into OK and snaked back southwest and remained there until after 23 UTC. The dryline actually made it to the border area, but then retreated after the storms formed southwest of SW OK in TX. This was probably because of the outflow from these storms propagating westward. This signal was not at all apparent in the models, maybe because of the dryline position. The storms that did form had some very distinct behavior with storms that formed to the north side of the 1st initiation episode moving North, not east like in the models. The southern storms were big HP supercells, slowly moving east northeast, and continually developing in SW OK and points further SW into TX (though only really the first few storms were big; the others were small in close proximity to the big storms – a scale enigma). We had highlighted the areas to the south in our morning forecast, along with an area in KS to the North but left a sight risk of CI in between. So while our distinct moderate risk areas would sort of verify in principle (being two counties further to the east than observed) we still did not have the overall scenario correct.

That scenario being, storms developing in our region and moving away, with the possibility for secondary development along the dryline a bit later. Furthermore we expected the storms to the north to develop in our moderate risk area and move east. When in fact the OK storms moved into our KS region just prior to our northern KS moderate area verifying well with an unanticipated arcing line of convection. This was a sequence of events that we simply could not have anticipated. We have discussed many times having the need to “draw what the radar will look like in 3 hours”. This was one of those days where we could not have had any skill whatsoever in accomplishing that task.

Drawing the radar in 1 or 2 or 3 hours is exactly what we have avoided doing given our 3 hour forecast product. We and the models, simply do not have that kind of skill at the scales where it will be required to have added value. This is not so much a model failure or even human failure. It is an operational reality that we simply don’t have enough time to efficiently and quickly mine the model data to extract enough information to make a forecast product. More on this later.

Back to the overnight convection. So once these SW OK supercells had been established we sought other model guidance, notably the HRRR from 16 or 17 UTC. By then it had picked upon the current signal and was showing a similar enough to observations evolution. This forecast would end up being the closest solution, but to be honest was still not that different way down in TX to the ensemble which was a 24-30 hour forecast. They all said the same thing: dryline boundary and moisture surge would collide, CI would ensue within 2 hours into a big line of convective storms that would last all night and make Fridays forecast very difficult.

Sure enough these boundary collisions did happen. From the surface stations point of view, winds in the dry air had been blowing SW all day with temps in the upper 80’s low 90s. After 02 UTC, the winds to the NW had backed and were now blowing W with some blowing W-NW. While at the dryline, winds were still SW but weakening. Ahead of the dryline, they were SE and weak. By 0400 UTC, the moisture surge intensified from weak SE winds to strong SE winds, with the dew point at CDS increasing from 34 to 63 in that hour. On radar the boundaries could be seen down in Midland, as very distinct with a clear separation:


You can see CI already ongoing to the north where the boundaries have already collided and the zipper effect was in progress further southwest but it took nearly 2 more hours.


Also note the “gravity waves” that formed in the upper level frontal zone within a region of 100 knots of vertical shear back in NM. Quite a spectacular event. Let me also note that the 00 UTC ensemble and other models DID NOT pick up this event, until 3 hours later than shown by the last radar image. Spin up may have played a significant role in this part of the forecast. As you can see, the issues we face are impressive on a number of levels, spatial, and temporal scales. We verified our forecast of this event with the help of the ensemble and the HRRR and the NSSL WRF. To reiterate the point of the previous post: It is difficult to know when to trust the models. But in this case we put our faith in the models and it worked out, whereas in the previous forecast, we put our faith in the models and we had some relative skill, but not enough to add value.

It’s complicated

As expected, it was quite a challenge to pick domains for days 2 and 3. Day 2 was characterized by 3 potential areas of CI: Ohio to South Carolina, Minnesota and Iowa, and Texas. We were trying to determine how to deal with pre-existing convection: whether it was in our domain already or would be in our domain during our assumed CI time. As a result, we determined that the Ohio to South Carolina domain was not going to be as clean-slate as Texas or Minnesota. So we voted out SC.

We were left with Texas (presumed dryline CI) and Minnesota (presumed warm front/occlusion zone). Texas was voted in first but we ended up making the MN forecast in the afternoon. Data for this day did not flow freely, so we used whatever was available (NSSL-WRF, operational models, etc).

The complication for TX was an un-initialized short wave trough emanating from the subtropical jet across Mexico and moving northward. This feature was contributing to a north to south band of precipitation  and eventually triggered a storm in central and eastern OK, well to the east of our domain. The NSSL WRF did not produce the short wave trough and thus evolved eastern TX much differently than what actually occurred despite having the subtropical jet in that area.  So we were gutsy in picking this domain despite this short wave passing through our area. We were still thinking that the dryline could fire later on but once we completed our spatial confidence forecast (a bunch of 30 percents and one 10 percent) and our timing confidence (~+/- 90 minutes) it was apparent we were not very confident.

This was an acceptable challenge as we slowly began to assemble our spatial forecast, settling on a 3 hour period in which we restrict ourselves to worrying only about new, fresh convection by spatially identifying regions within our domain where convection is already present. This way we don’t have to worry about secondary convection directly related to pre-existing convection. We also decided that every forecaster would enter a spot on the map where they thought the first storms would develop (within 25 miles of their point). This makes the forecast fun and competitive and gets everyone thinking not just about a general forecast but about the scenario (or scenarios if there are multiple in the domain).

The next stop on this days adventure was MN/IA/Dakotas. This was challenging for multiple reasons:
1. The short wave trough moving north into OK/KS and its associated short wave ridge moving north northeast
2. the dryline and cold front to the west of MN/IA,
3. the cold upper low in the Dakotas moving east north east.

The focus was clear and the domain was to be RWF. This time we used a bigger domain in acknowledgement of the complex scenario that could unfold. You had the model initiating convection along the warm front, along the cold front in NE on a secondary moisture surge associated with the short wave trough, and a persistent signal of CI over Lake Superior (which we ignored).

We ended up drawing a rather large slight risk extending down into IA and NE from the main lobe in MN with a moderate area extending from south central MN into northern IA. After viewing multiple new products including simulated satellite imagery (water vapor and band differencing from the NSSL WRF and the Nearcast moisture and equivalent potential temperature difference, it was decided that CI was probably with everyone going above 50 percent confidence.

In Minnesota we did quite well, both by showing a gap near Omaha where the moist surge was expected but did not materialize until after our 0-3 UTC time period. Once the moisture arrived … CI. In MN CI began just prior to 23 UTC encompassing some of our moderate risk even down into IA, yet these “Storms” in IA were part of the CI episode but would not be objectively classified as storms from a reflectivity and lifetime perspective, but they did produce lightning.

The verification for Texas was quite bad. Convection formed to the east early, and to the west much later than anticipated associated with a southern moisture surge into NM from the upper level low migrating into the area nearly 11 hours after our forecast period start.

As it turns out, we awoke this morning to a moderate risk area in OK, but the NM convection was totally missed by the majority of model guidance! The dryline was in Texas still but now this convection was moving toward our CDS centerpoint and we hoped that the convection would move east. A review of the ensemble indicated some members had some weak signals of this convection, but it became obvious that it was not the same. We did key in on the fact that despite the missed convection in the TX panhandle the models were persistent in secondary initiation despite the now-developing convection in southern TX. We outlooked the area around western OK and parts of TX.

In the afternoon, we looked in more detail at the simulated satellite imagery, nearcast, and the CIRA CI algorithm for an area in and around Indiana. This was by far the most complicated and intellectually stimulating area. We analyzed the ensemble control member for some new variables that we output near the boundary layer top (1.2 km AGL roughly): WDT: the number of time steps in the last hour where w exceeded 0.25 m/s and convergence . We could see some obvious boundaries as observed, with a unique perspective on warm sector open celled convection.

In addition we used the 3 hour probabilities of CI that have been developed specifically for CI since these match our chosen 3 hour time periods. We have noticed significant areal coverage from the ensemble probabilities which heavily weight the pre-existing convection CI points. Thus it has been difficult to assign the actual new CI probabilities since we cant distinguish the probability fields if two close proximity CI events are in the area around where we wish to forecast. That being said, we have found them useful in these messy situations. We await a clean day to see how much a difference that makes.

Anatomy of a Well Forecast Bow Echo


Above is an example of one of the forecasts from the Spring Experiment models from Friday. This bow echo moved across southwest Missouri early Friday morning and these images are centered on Joplin, MO (JLN). On the left is the 13h forecast from the WRF-NMM 4km model initialized at 00Z 08-May-2009 and valid at 13Z. On the right is the verifying 1km base reflectivity image with the model fields for winds overlaid on the radar. The barbs in each of the images are the model’s instantaneous 10m winds in knots (with the grid skipped to lessen the clutter). The isotachs are plotted from the WRF “history variables” for maximum U,V 10m winds (no grid skip). These are the maximum 10m wind speeds in the model over the past hour ending at 13Z.

Instantaneous 10m winds in the model at 13z, near the rotating bow head, are at least 50 knots. The maximum model 10m winds over the past hour range from 60-70 knots near and north of the weak echo channel and around the comma-head of the bow.

This was only one of several exceptional forecasts of this feature from the models being evaluated in this year’s Spring Experiment. To see more output on this case and more, check out the Spring Program website here: