HWT EFP – Page 2 – The Experimental Forecast Program

Short-Term Forecasting Methods

Posted on May 7, 2018 by Burkely Twiest.

This year, the Spring Forecasting Experiment is focusing on the Day 1 time period more than ever before, eschewing the long lead-time forecasts that we have made in previous years in favor of honing in on timing information and allowing participants to delve into the data. Since more data than ever before is available within the drawing tool where participants draw their forecasts, we’re excited to see how participants probe the new data within the various ensemble subsets.

One short-term experimental forecast product being generated on the Innovation Desk this year are Potential Severe Timing (PST) areas, which indicate which 4-hr period severe weather will occur in over the general area of 15% probability of severe. By identifying the timing of the severe event and displaying all of the timing contours on one graphic, the end product is hoped to be valuable for emergency managers and broadcasters for their advance planning. Small groups of participants generate these forecasts around subsets of the CLUE and HREFv2 ensembles, meaning that on any given day we’ll ideally have 5 separate sets of PSTs. After the participants separate into their small groups and issue their forecasts, we ask them to come back together and brief one another on what their particular ensemble subset was doing. This way, each group of participants can delve into the data from their subset more deeply than if the activity were to take place as one large group. This briefing period also allows the participants to be exposed to different lines of reasoning in issuing their forecasts, and has thus far sparked several good discussions.

Here are the PSTs from 3 May 2017, or Thursday of last week:The different ensemble subset groups compose the top row and the left and middle section of the bottom row, while the bottom right hand panel shows the forecast from the expert forecaster facilitator on the Innovation Desk. Several different strategies are evident within the panels, including some groups that chose not to indicate timing areas for all of the 15% area of our full-period outlook (shown below).

The reasoning from the groups for their different areas gave insight into the model performance as well as the different forecasting strategies employed by the different groups of people. The group using the HREFv2 decided not to use the NMMB member when generating their forecasts, because the depiction of morning convection was so poor. The HRRRE group had very large areas, which they attribute to the large spread within the HRRRE. The NCAR group decided to discount the guidance in the north of the domain, because of erroneous convection in the northern domain. Instead, they felt more confident in the southern areas where the ensemble was producing supercells. Their group thought that the thermodynamics of the northern area was less conducive to supercellular convection. The group using the mixed physics ensemble from CAPS placed their first area based on where they thought convective initiation would occur, indicating that they thought convection would quickly become severe. Their southern PST was very late to cover any severe threat overnight, but they considered that it might be more of a flood threat (which we do not forecast for in the Spring Forecasting Experiment). The stochastic physics group (another ensemble run by CAPS), on the other hand, had an ensemble which showed almost no signal in the southern area of interest. It also showed a later signal than the other ensembles, contributing to the spread in the time of the first PST.

All of these details came out during the discussion of the PSTs, after participants dove into the data from their subensemble. How did the PSTs do? Here’s a snapshot of the PSTs with reports from 18-22 UTC overlaid:Ideally, all of the reports would fall into the 18-22 UTC contours, which mostly occurred for the expert forecaster and did occur for the HRRRE and Mixed Physics group, although both groups had large areas of false alarm. Here’s a similar image, but showing reports from 22-02 UTC:At this point in time, all groups missed the activity in Kansas, although some groups captured most of the reports within a 22-02 UTC window.

The day after the forecasts, participants are able to go through and give ratings based on the reports that have come in, and choose the group’s forecast that they thought performed the best. Who performed the best for this case? 3 votes for HREFv2, 2 votes each for the HRRRE and the CAPS Stochastic Physics ensemble, and one vote each for the CAPS Mixed Physics and the NCAR ensemble group. Clearly, the complexity of this case provided plenty of nuances to evaluate, and I would bet that more complex cases such as this are on the way….after all, we’ve only just begun Week 2 of the 2018 SFE!

Springing into SFE 2018

Posted on April 29, 2018 by Burkely Twiest.

Somehow, it’s already that time of year again, when Gulf moisture surges northward, strong upper-level dynamics sweep across the contiguous United States, and forecasters and researchers alike flock to NOAA’s Hazardous Weather Testbed for the annual Spring Forecasting Experiment. The upcoming week looks to be a busy one, with a strong trough poised to move across the United States over the course of the next five days.

This year’s experiment will be quite different than prior experiments, as we’ve listened to the feedback participants have given us. Full details can be found in this year’s Operations Plan, but some highlights include:

Completely redesigned web pages for the forecasts and evaluations, courtesy of our new webmaster Brett Roberts,
The capability of participants to dive into the data, with multiple experimental subsets’ data available within the forecast drawing tool,
Experimental outlooks driven by ensemble subsets, focusing on high temporal resolution forecasts within the Day 1 period,
NEWS-e activities on both the Severe Hazards Desk and the Innovation Desk, allowing all participants to interact daily with the NEWS-e,
Larger Chromebooks, with bigger screens to make looking at all this data easier,
and of course, a new blog site with more features that I hope to explore throughout the experiment!

Of course, we also will be exploring a number of new concepts in SFE 2018, encompassing both forecast methods and ensemble configuration techniques within the CLUE framework (see Clark et al. 2018 for a formal description of the CLUE from previous years). We’ll generate probabilistic forecasts of individual hazards over 4-h windows, a timing graphic that communicates when we expect areas to see severe weather, and hourly probabilistic forecasts of severe weather informed by the NEWS-e. We’ll examine the impact of different physics parameterizations in the FV3 model, the impact of stochastic physics perturbations on the WRF-ARW model, new methods of ensemble subsetting based on sensitivity, implementing the MET scorecard for CAM ensembles, and new object-based visualization techniques. This feels like one of the most varied SFEs we’ve had in a while – there’s sure to be something interesting to look at for everyone!

I know I speak for all of the facilitators when I say that we’re excited for this year’s experiment. Whether you’re travelling to Norman over the next five weeks or following along online, we hope that this year’s experiment will provide plenty of interesting results for real-time analysis. Stay tuned!

SFE 2017 Wrap-Up

Posted on June 14, 2017 by Burkely Twiest.

The 2017 SFE drew to a close a little over a week and a half ago, and on behalf of all of the facilitators, I would like to thank everyone who participated in the experiment and contributed products. Each year, preparation for the next experiment begins nearly immediately after the conclusion of the SFE, and this year was no exception.

This SFE was busier than SFE 2016, in that the Innovation Desk forecast a 15% probability of any severe hazard every day during the experiment – and a 15% verified according to the practically perfect forecasts based on preliminary LSRs. This was despite having a relatively slow final week. Slower weeks typically occur at some point during the experiment, and enhance the operational nature of the experiment. After all, SPC forecasters are working 365 days a year, whatever the weather may be! The Innovation Desk also issued one of their best Day 1 forecasts of the experiment during the final week, successfully creating a gapped 15%. If you read the “Mind the Gap” post, you know the challenges that go into a forecast like this:

This forecast was a giant improvement to the previously-issued Day 2 forecast, which had the axis of convection much too far north:

As for other takeaways from the experiment, the NEWS-e activity introduced an innovative tool for the forecasters, and will likely continue to play a role in future SFEs. Leveraging convection-allowing models at time scales from hours (i.e., NEWS-e, the developmental HRRR) to days (i.e., the CLUE, FVGFS) allows forecasters to understand the current capabilities of those models. Similarly, researchers can see how the models are performing under severe convective conditions and target areas for improvement. A good example of this came from comparing different versions of the FVGFS – two different versions were run with different microphysics schemes, and produced different-looking convective cores. Analyzing the subjective and objective scores post-experiment will allow the developers to improve the forecasts. For anyone interested in keeping up with some of these models, a post-experiment model comparison website has been set up. Under the Deterministic Runs tab, you can look at output for the FVGFS, UK Met Office model, the 3 km NSSL-WRF, and the 3 km NAM from June 5th onward.

Much analysis remains to be done on the subjective and objective data generated during the experiment. Preliminary Fractions Skill Scores (FSSs) for each day:

and aggregated across the days for each hour:

give a preliminary metric of each ensembles’ performance. The FSS looks at the number of gridboxes covered by a phenomenon (in this case, reflectivity) within a certain radius in the forecast and the observations, therefore eliminating the problem of double penalization incurred when a phenomenon is slightly displaced between the forecasts and the observations. The closer the score is to one, the better it is. Now, there are some data drop-outs in this preliminary data, but it still looks as though the SSEO is performing better than most other ensembles. Aggregated scores across the experiment place the SSEO first, with an FSS of .593. The HREFv2, which is essentially an operationalized SSEO with some differences in the members, was second, with an FSS of .592. Other high-performing ensembles include the NCAR ensemble (.580) and the HRRR ensemble (.559). Again, this data is preliminary, and these numbers will likely change as the cases that didn’t run on time in the experiment is rerun.

As for what SFE 2018 will hold, discussions are already underway. Expect to see more of the CLUE, FVGFS, and NEWS-e. A switch-up in how the subjective evaluations are done and revamp of the website is also in the pipeline. Even as the data from SFE 2017 begins to be analyzed, we look forward to SFE 2018 and how we can continue to improve the experiment. Ever onward!

Revisiting the Isochrones

Posted on May 28, 2017 by Burkely Twiest.

One of the innovations introduced last year was the drawing of isochrones, or lines of equal time. These isochrones indicate the start time of the four-hour period where 95% of the reports at a point were expected to occur – each point has one four-hour period. This year, to aid in the drawing of the isochrones, participants now also draw hourly report coverage areas each hour from 18Z to 03Z. The final product looks something like this forecast from 25 May 2017:

The above image says that the area to the east of the 19Z line will see its most severe weather from 1900 UTC – 2300 UTC, and the areas east of the 23Z line will see the peak severe weather from 2300 UTC – 0300 UTC the following day. Ideally, these lines will be displayed on the 15% threat area (the “slight” risk equivalent) to determine the eastern bound of the final line – currently participants and the NSSL desk lead have this forecast as a background when drawing their isochrones.

Evaluating the Forecast Evolution

Posted on May 23, 2017 by Burkely Twiest.

Every year in the SFE, a fundamental problem arises when evaluating the long-range full-period forecasts: how to rate the long-range full period forecasts. On the Innovation Desk, participants are given the chance to issue Day 3 forecasts in addition to their day 1 forecast, if time and the potential warrants. Due to the weekly structure of the SFE (which runs M-F), at best two of these forecasts can be evaluated each week – those issued on Monday for Wednesday, and those issued on Tuesday for Thursday. Luckily, with a relatively active period of severe weather CONUS-wide, the three weeks of the experiment so far have yielded evaluations for four out of the potential six days that have Day 3 forecasts. Two of these forecasts give examples of how the long-range forecasts can change as the day of the event draws nearer, and more guidance becomes available: 10 May 2017 and 18 May 2017.

Mind the Gap

Posted on May 20, 2017 by Burkely Twiest.

Continuing on the last post’s theme of choosing the proper forecasting domain when we have multiple areas of convection to contend with, today’s discussion will focus on the boldest of forecasting moves:

The gap.

The full period forecasts issued by each desk are a group effort, with input from participants guiding the placement of the lines. Prior to issuing the lines, participants consider observations, coarse-scale operational models such as the GFS and the NAM, and fine-scale operational and experimental models, such as the HRRR, FVGFS, and the members of the CLUE ensemble. Convection-allowing models, with grid spacing of ~3 km, provide very realistic-looking radar signatures that can give confidence in specific areas of threat beyond those of the GFS. For a quick example, see the GFS forecast for 18 May 2017 at 0000 UTC:

The echos from the HRRR suggest that these storms would be supercells, given the strong tracks of hourly updraft helicity (as indicated by the black contours) and the individual reflectivity echoes. Images such as these can give forecasters more confidence in the location(s) of convection, particularly when compared to the larger-scale QPF precipitation products that current coarse-resolution models can provide.

So what does this have to do with gapping the forecasts? And what does gapping the forecasts even mean?

Picking Areas during an Active Week

Posted on May 16, 2017 by Burkely Twiest.

This week is gearing up to be the most active week thus far in the SFE, with every day having the chance of severe weather somewhere in the center of the country. Yesterday, we had three separate areas of potential severe weather to consider:

The first area was concentrated across northern Iowa and far southwestern Wisconsin, the second area stretched from central Nebraska south through western Oklahoma, and the third area was in western South Dakota. Since SFE forecasts cover a subset of the contiguous United States, choosing which areas to forecast for is an important part of the forecast process. In this case, the worst severe convection was anticipated within the eastern two areas, and the forecast domain was chosen to encompass as much of those areas as possible.

CAM Guidance in a Mixed-Mode Case

Posted on May 12, 2017 by Burkely Twiest.

Yesterday, 11 May 2017, gave the participants in SFE 2017 many things to consider. A potent upper-level low pressure system was finally evolving eastward, after giving the Experiment interesting weather to forecast all week while sitting over the southwest. As the experiment began, ongoing elevated convection was already producing reports over northeastern Oklahoma, and the participants were eyeing the chance for some severe weather locally.

By 2000 UTC (3PM CDT, near the end of the SFE’s daily activities), cellular convection was initiating all across northern Oklahoma, northern and western Arkansas, and northeast Texas. Many of these storms quickly began to rotate.

All indications from the CAMs were that ample convection would occur over a large area, raising concerns over potential outflow boundaries and convective mode. There were also concerns about elevated vs. surface-based storms, which could produce different hazards.

On all days in the experiment, we look at guidance from CAMs and CAM ensembles alike, and yesterday the paintball plots of reflectivity greater than 40 dBZ from the ensembles produced Jackson Pollock-esque maps by 00 UTC. The Storm-Scale Ensemble of Opportunity (SSEO) had many members showing a linear structure stretching from Arkansas to Texas and individual cells stretching along the warm front near the Arkansas-Missouri border. However, the members were in more disagreement over what was occurring over north-central Oklahoma, showing both linear systems and discrete cells.

The NCAR ensemble showed a relatively similar area of convection, albeit with more coverage by the convective cores. The increased coverage may be partly a function of having more two members in the NCAR ensemble than in the SSEO. The linear structures stretching down the cold front at 0000 UTC were slightly more segmented than some of the SSEO members, and there appeared to be many more linear structures along the warm front than in the SSEO. Some members gave the indication of a double line structure, with linear convection along the cold front behind the main squall line.

Participants also considered the HRRR ensemble, which had linear structures further to the north, convection all across Arkansas and perhaps less focused on the warm front, and more segmented structures from Arkansas to Texas. However, it was also showing somewhat of a double-line structure along the cold front and behind the initial squall line, as was seen in the NCAR ensemble. The HRRR ensemble at this point also showed less convection in north-central Oklahoma than the NCAR ensemble and the SSEO.

These are just snapshots of the reflectivity greater than 40 dBZ at one point in time. Participants also considered the updraft helicity tracks and probabilities, the overall reflectivity fields, and the entire period from 1600 UTC to 1200 UTC the following day. Besides the ensembles, extra focus was given yesterday to the operational and experimental versions of the HRRR, which at 0000 UTC showed very different structures underneath the center of the low.

The operational HRRR (top) has heavier convection with more coverage over north-central Oklahoma, and more stratiform precipitation behind the linear structures in Arkansas. There were also more hints in the operational HRRR of cellular structures along the warm front, but the experimental HRRR (bottom) depicted more intense cells across northeastern Oklahoma.

While the CAM ensembles suggested mixed-mode signals for this hour, the upscale growth of convection was forecast to be mostly complete by this hour. How did the CAMs do?

The radar images tell the tale – there was a small double line structure, with a large squall line stretching into Texas and a second line across northwestern Oklahoma. Cells along the warm front in Missouri and Arkansas did not congeal into a line, and a few cells were present in central Oklahoma.

Prior to 0000 UTC, significant hail and wind reports across much of north-central Oklahoma (including a 4.25″ hail report!), associated with supercells close to the core of the low pressure system. This case study will be an interesting one for CAM developers, encompassing mixed modes, early elevated convection, significant severe weather, and two active fronts. The atmosphere was more than happy to provide us with any type of midlatitude convective weather we could ask for yesterday, offering researchers and forecasters alike new questions and challenges.