Archives: MS1 - MS2 - MS3 - MS4 - MS5
To use the milestone, run milestone3.m
For this milestone, Matt Rydzik was elected team leader of team one. Our work continues to be displayed on the server http://kiveo.coaster-net.com. The team is still made up of Allyson Clark, Ben Green, Tyler Ruff, and Matt Rydzik.
The file for this milestone starts off in a similar manner to our other files. The .txt file is read from the z-drive and put into thirteen different arrays. However, for this milestone, we also defined three variables, t, z, and dummy, to be used throughout our program.
After this housekeeping is finished, the program will run through a series of loops. The loop is set to run through all years except 2000 and 2002 to create the data set for Weka. Because of an error in the netCDF relative humidity data for 2000 and sea/land surface temeprature for 2002, we decided to cut those years out of the main program using an if loop. The error with 2000 data was bad RH data and the error with 2002 was bad temperature data. However, we could take out this loop and include the data from 2000 if we decided to ignore all relative humidity data or ignore all sea/land temeprature data for 2002.
Next, an array was created through a for loop that separates the storms by name. The intensities and times for these storms are also matched up and placed in rows. If the storm identifiers do not match, then the program moves over to the next column.
Since we are still using Kossin’s data, we needed to fix the data once more, which required the data to be a -180 to 180 degrees longitude based data set. However, for Weka the data would need to be a 0 to 360 degrees longitude based data set. We decided to create two separate arrays for the longitude that could be called when needed. Stormlon held the 0 to 360 degree data and stormlonplot held the -180 to 180 degree data. A for loop checked stormlonplot to see if a storm jumped more than 6 degrees; if the storm did jump a distance larger than six degrees, then an if loop averaged the next point and the previous in both the stormlonplot and stormlon arrays.
Once we had a corrected data set we could begin to create the final array of data. We decided to look at 25 different variables: change in intensity for the previous data point and the next data point; change in latitude for the previous data point and the next; change in longitude for the previous and the next data point; changes in radius of max and 34 knot wind speed for the previous and next data point; percent relative humidity; east-west component of the wind at 925, 850, 700, 500, 400, and 300 millibars; north-south component of the wind at 925, 850, 700, 500, 400, and 300 millibars; whether the storm was over land or not; and what the surface temperature was at the point. The first ten would come from the original data set while the last fifteen would be from the netCDF data.
We gathered the data from Kossin’s file first. We started at the second data point and ended at the second-to-last data point for each storm. Because some of the data was taken at erratic intervals, we included an if loop that would only run if the two data points were 3 hours apart. The columns of the stormdata array were filled in a common way: change of the variable divided by the change in time in hours. The most complicated variable was the change in longitude because we were looking at change in kilometers. We first calculated the change in longitude and then converted that into the change in kilometers. The m and k values of the points were then stored in the usedpoints array. This array will allow us to send whole arrays of data points to netCDF instead of only individual points.
To get the data from netCDF, we sent the usedpoints array with the data points we wanted to pull data from. Five separate functions got the data points we wanted and stored them in separate arrays. The functions are all coded fairly similarly with a few differences for when we wanted to look at synoptic scale weather and wanted points around the storm and not in the storm. The u and v wind components and temperature components were synoptic variables while land and relative humidity data were not. The difficulty of the netCDF data was finding what time we wanted. We had to do conversions from the time in Kossin’s data to that time format for netCDF. We then used a floor function to get the data from the correct time because netCDF data is only six hour intervals and not the three hours that Kossin’s data used.
In the u- and v-component functions as well as surface temperature function, we first opened up the netCDF file and looked at the variables defined for the year we wanted. We created variables of the dimensions of time, latitude, and longitude from the netCDF object. We looked at the data around the point the storm data was taken at. We found the netCDF data at the closest latitude and longitude in each cardinal direction around the point and then averaged them all together.
The functions for relative humidity and land data was similar to the u- and v- wind components and surface temperature, but without the averaged position. We were looking at the data at the actual point of the storm. We wanted to see whether the storm was over or land or not, so the variable was stored in a binary variable. As well, we had to make sure to close the netCDF file at the end of the function because too many files were open, which negatively impacted our program.
After all of the functions are run and the netCDF data is placed into separate arrays, we needed to put the data in our overall array. By using the dummy variable we had defined earlier, we put the netCDF data for each point in the stormdata array, starting at the next column from where we had left off.
The last part of our code is used to create a header for our cell array. Using the cellstr function, we added a label to each of the columns to describe the data found in each column. To write the data to the csv file, we used the cellwrite function. We found the cellwrite function on the Internet. Francis Barnhart wrote the original code, which can be found at http://francisbarnhart.com/blog/2005/01/19/matlab_pain/.
We had originally coded the program by sending the data points individually to netCDF. This caused the program to slow down enormously from its current incarnation because the file was opened and closed for each data point. While the most linear method of completing the milestone, it was not the most efficient. Our current method, of sending entire arrays of data points to netCDF is more efficient because netCDF is not opened and closed thousands of times while the program runs.
A separate project that Matt completed is found on the website (under the ‘trial’ tab). Using Google Maps technology, visitors the website can view the data points for each storm on a map of the world. After choosing which storm they would like to view from a drop-down menu, a map is displayed with the data points from the storm, with the data color-coded with regards to intensity of storm, like in milestone two. Matt used a MySQL database and php coding to load the data from the original files into Google maps and then display on the website. The MySQL database was used instead of a flat text file because it loaded more quickly and provided more flexibility for the php coding.