Data Mining – HWT EFP

Sneak Peak from the past

Posted on December 17, 2011 by James Correia Jr..

So after the Weather Ready Nation: A Vital Conversation Workshop, I finally have some code and visualization software working. So here is a sneak peak, using the software Mondrian and an object identification algorithm that I wrote in Fortran, applied via NCL. Storm objects were defined using a double threshold, double area technique. Basically you set the minimum Composite Reflectivity threshold, and use the second threshold to ensure you have a true storm. The area thresholds apply to the reflectivity thresholds so that you restrict storm sizes (essentially as a filter to reduce noise from very small storms).

So we have a few ensemble members from 27 April generated by CAPS which I was intent on mining. The volume of data is large but the number of variables was restricted to some environmental and storm centric perspectives. I added in the storm report data from SPC (soon I will have the observed storms).

In the upper left is a barchart of my cryptic recording of observed storm reports, below that is the histogram of hourly maximum surface wind speed, and below that is the integrated hail mixing ratio parameter. The two scatter plots in the middle show the (top) CAPE-0-6km Shear product versus the hourly maximum updraft helicity obtained from a similar object algorithm that intersects with the storm, and the (bottom) 0-1km Storm Relative Helicity vs the LCL height. The plots to the right show the (top) histogram of model forecast hour, (bottom) sorted ensemble member spinogram*, and (bottom inset) the log of the pixel count of the storms.

The red highlighted storms used a CASH value greater than 30 000 and UHobj greater than 50. So we can see interactively on all the plots, where these storms appear in each distribution. The highlighted storms represent 24.04 percent of the sample of 2271 storms identified from the 17 ensemble members over the 23 hour period from 1400 UTC to 1200 UTC.

Although the contributions from each member are nearly equivalent (not shown; cannot be gleaned from the spinogram easily), some members contribute more of their storms to this parameter space (sorted from highest to lowest in the member spinogram). The peak time for storms in this environment was at 2100 UTC with the 3 highest hours being from 2000-2200 UTC. Only about half of the modeled storms had observed storm reports within 45km**. This storm environment contained the majority of high hail values though the hail distribution has hints of being bimodal. The majority of these storms had very low LCL heights (below 500 m) though most were below 1500m.

I anticipate using these tools and software for the upcoming HWT. We will be able to do next day verification using storm reports (assuming storm reports are updated via the WFO’s timely) and I hope to also do a strict comparison to observed storms. I still have work to do in order to approach distributions oriented verification.

*The spinogram in this case represents a bar chart where the length of the bar is converted to 100 percent and the width of the bar is the sample size. The red highlighting now represents the within category percentage.

**I also had to do a +/- 1 hour time period. An initial attempt to verify the tornado reports in comparison to the tornado tracks yielded a bit of spatial error. This will need to be quantified.

Forecast Soundings: A Look to the Future (Literally)

Posted on July 26, 2011 by .

One of the data visualization tools we utilized in the HWT-EFP this year is a way to view ensemble soundings. I put together a blog post about how we did this on my personal web site and thought I’d share that post here!

You can find the post here: http://www.patricktmarsh.com/2011/07/forecast-soundings-a-look-to-the-future/

More Data Visualization

Posted on June 22, 2011 by .

As jimmyc touched on in his last post, one of the struggles facing the Hazardous Weather Testbed is how to visualize the incredibly large datasets that are being generated. With well over 60 model runs available to HWT Experimental Forecast Program participants, the ability to synthesize large volumes of data very quickly is a must. Historically we have utilized a meteorological visualization package known as NAWIPS, which is the same software that the Storm Prediction Center uses for their operations. Unfortunately, NAWIPS was not designed with the idea it would be handling the large datasets that are currently being generated.

To help mitigate this, we utilized the Internet as much as possible. One webpage that I put together is a highly dynamical, CI forecast and observations webpage. This webpage allowed users to create 3, 4, 6, or 9 panel plots, with CI probabilities of any of 28 ensemble members, NSSL-WRF, or observations. Furthermore, users had the ability to overlay the raw CI points from any of the ensemble members, NSSL-WRF, or observations to see how the points contributed to the underlying probabilities. We even enabled it so that users could overlay the human forecasts to see how it compared to any of the numerical guidance or observations. This webpage turned out to be a huge hit with visitors, not only because it allowed for quick visualization of a large amount of data, but because it also allowed visitors to interrogate the ensemble from anywhere — not just in the HWT.

One of the things we could do with this website is evaluate the performance of individual members of the ensemble. We could also evaluate how varying the PBL schemes affected the probabilities of CI. Again, the website is a great way to sift through a large amount of data in a relatively short amount of time.

Visualization

Posted on June 12, 2011 by James Correia Jr..

We all did some web displays for various components of the CI desk. I built a few web displays based on object identification of precipitation areas. I counted up the objects per hour for all ensemble or all physics members (separate web pages) in order to 1) rapidly visualize the entire membership and 2) to add a non-map based perspective of when interesting things are happening. It also allows the full perspective of the variability in time, and variability of position and size of the objects.

The goal was to examine the models in multiple ways simultaneously and still investigate the individual members. This, in theory, should be more satisfying for forecasters as they get more comfortable with ensemble probabilities. It could alleviate data overload by giving a focused look at select variables within the ensemble. Variables that already have meaning and implied depth. Information that is easy to extract and reference.

The basic idea as implemented was to show the object count chart and upon mousing over a grid cell you can call up a map of the area with the precipitation field. At the upper and right most axes, you call up an animation of either all the models at a specific time OR one model at all times. The same concept was applied to updraft helicity.

I applied the same idea to the convection initiation points only this time there were no objects, just the raw number of points. I had not had time to visualize this prior to the experiment, so we used this as a way to compare two of the definitions in test mode.

The ideas were great, but in the end there were a few issues. The graphics were good in some instances because we started with no precipitation or updraft helicity or CI points. But if the region already had storms then interpretation was difficult, at least in terms of the object counts. This was a big issue with the CI points, especially as the counts increased well above 400, for a 400 by 400 km sub domain.

Another display I worked hard on was the so-called pdf generator. The idea was to use the ensemble to reproduce what we were doing, namely putting our CI point on the map where we thought the first storm would be. Great in principle, but automating this was problematic because we could choose our time window o fit the situation of the day. The other complication was that sometimes we had to make our domain small or big, depending on how much pre-existing convection was around. This happened quite frequently so the graphic was less applicable, but still very appealing. It will take some refinement but I think we can make this a part of the verification of our human forecasts.

I found this type of web display to be very useful and very quick. It also allows us to change our perspective from just data mining to information mining and consequently to think more about visualization of the forecast data. There is much work to be done in this regard and I hope some of these ideas can be further built upon for visualization and Information Mining so they can be more relevant to forecasters.

HWT EFP

The Experimental Forecast Program

Tag: Data Mining

Sneak Peak from the past

Forecast Soundings: A Look to the Future (Literally)

More Data Visualization

Visualization