Archives: MS1 - MS2 - MS3 - MS4 - MS5
For this milestone, Tyler Ruff was elected team leader of team one. Our work continues to be displayed on the server http://kiveo.coaster-net.com. The team is still made up of Allyson Clark, Ben Green, Tyler Ruff, and Matt Rydzik.
During this milestone, we compared several different statistical models for predicting changes in hurricane intensity, latitude, longitude, radius of max winds and radius of 34kt winds. By an overwhelming majority, a simple linear regression had the best correlation and least mean squared error for predicting change in the hurricane over three hours. We also looked at neural networks and other statistical models. We also compared our data set to the other data sets available from the two classes but found that ours gave us the best correlation coefficients.
The linear regression model simply fits our statistical data to a line that that is weighted by our independent parameters that we included from our data set. Here is the output from a linear regression model to predict intensity, latitude, longitude, radius of max winds and radius of 34kt winds (with comments in bold):
Intensity:
ChangeNInt = 0.6567 * ChangePInt - 0.033 * ChangePMaxWind + 0.0239 * ChangeP34Wind - 0.0016 * RHPercent - 0.0106 * U925 + 0.0175 * U850 - 0.0055 * U700 + 0.0034 * U500 - 0.0045 * U300 - 0.0134 * V925 + 0.0162 * V850 - 0.0109 * V700 + 0.007 * V500 - 0.0044 * V400 + 0.0037 * V300 - 0.3232 * LandMask + 0.007 * LandSeaTemp - 0.0036 * Latitude - 0.0048 * Int - 0.0055 * Vshear - 1.5462
Explanation of the equation:
ChangeNInt = This is the variable we are predicting, which is the change in forward intensity.
0.6567 * ChangePInt + The change in forward intensity is strongly predicted by the change in intensity over the past 3 hours—this makes sense because a storm that is already intensifying is most likely going to keep intensifying over the next 3 hours.
-0.033 * ChangePMaxWind + This makes sense because the radius of the max winds will decrease with an intensifying storm (a smaller eye wall due to conservation of angular momentum indicates a strong storm).
0.0239 * ChangeP34Wind + Makes sense because an intensifying storm should grow in size, pushing the radius of gale-force winds outward.
-0.0016 * RHPercent + This is the relative humidity at 850mb in the center of the storm, and it is not a great predictor of storm intensity because it would be expected to be positively correlated, but perhaps it is slightly negatively correlated because a more intense storm will have a well-defined eye that would drive RH values down.
-0.0106 * U925 + This indicates that a 925mb wind toward the east will likely drive the hurricane to decrease in intensity.
0.0175 * U850 + An 850mb wind to the east will intensify the storm.
-0.0055 * U700 + A 700mb wind toward the east will likely drive the hurricane to decrease in intensity.
0.0034 * U500 + A 500mb wind to the east will intensify the storm.
-0.0045 * U300 + Easterly 300mb winds likely steer strong hurricanes to intensify towards the west.
-0.0134 * V925 + This indicates that a 925mb wind toward the north will likely drive the hurricane to decrease in intensity.
0.0162 * V850 + An 850mb wind to the north will increase the storm intensity.
-0.0109 * V700 + A 700mb wind toward the north will likely drive the hurricane to decrease in intensity.
0.007 * V500 + A 500mb wind to the north will intensify the storm.
-0.0044 * V400 + A 400mb wind to the north will weaken the storm.
0.0037 * V300 + A 300mb wind to the north will likely steer the storm to intensify.
-0.3232 * LandMask + This makes sense because a storm over land is going to die quite quickly.
0.007 * LandSeaTemp + Makes sense because the higher the sea surface temperatures, greater moisture and energy are available to strengthen the storm.
-0.0036 * Latitude + As the storm moves north, in general it will die because it may be reaching colder waters, going towards land, and encountering greater wind shear due to the higher surface temperature gradients.
-0.0048 * Int + This is negatively correlated because a storm that is already “intense” will be less likely to further intensify.
-0.0055 * Vshear + Makes sense because in general, the greater the wind shear, in this case in the V direction, the storm will be ripped apart easier and lose intensity.
-1.5462 Additional factor
Time taken to build model: 1 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient: 0.7315
Mean absolute error: 0.4542
Root mean squared error: 0.7522
Relative absolute error: 66.357 %
Root relative squared error: 68.1726 %
Total Number of Instances: 10200
Latitude
ChangeNLat = 0.9482 * ChangePLat + 0.0087 * ChangePLon - 0.0346 * ChangePMaxWind + 0.0066 * RHPercent + 0.1149 * U925 - 0.0573 * U700 + 0.01 * U500 + 0.0445 * U400 - 0.0323 * U300 + 0.0369 * V925 + 0.0466 * V850 + 0.0292 * V700 + 0.0705 * V400 + 0.0116 * V300 + 0.1107 * LandSeaTemp - 0.0197 * Latitude - 0.0156 * Ushear - 32.5421
ChangeNLat = This is the variable we are predicting, which is the change in forward latitude.
0.9482 * ChangePLat + The change in forward latitude is strongly predicted by the change in latitude over the past 3 hours—this makes sense because a storm that is already moving in the meridional direction will almost surely continue to do so.
0.0087 * ChangePLon + Makes sense because longitudinal movement should not dictate the meridional movement, and indeed, storms that move northward will likely start out moving westward and then turn eastward as it dies.
-0.0346 * ChangePMaxWind + Makes sense that radius of strongest winds should be negatively correlated because a storm moving northward encounters a larger coriolis force and will must “spin up” to conserve angular momentum, thereby decreasing the radius max winds.
0.0066 * RHPercent + Very weak correlation, which makes sense that relative humidity is not a great predictor of latitudinal storm movement.
0.1149 * U925 + Makes sense because surface winds on average will become more westerly as you move northward from the tropics.
-0.0573 * U700 + Winds at 700mb apparently become more easterly toward the north along storm tracks.
0.01 * U500 + Winds at 500mb apparently become more westerly as storms move toward the north.
0.0445 * U400 + Winds at 400mb apparently become more westerly as storms move toward the north.
-0.0323 * U300 + The negative correlation doesn’t exactly make sense because you’d expect fairly strong upper-level westerlies as you move northward, not easterlies as is predicted by the negative correlation.
0.0369 * V925 + As a storm moves northward, it will likely encounter more southerly winds near the surface, which is indicated by the positive correlation.
0.0466 * V850 + As a storm moves northward, it will likely encounter more southerly winds at 850mb, which is indicated by the positive correlation.
0.0292 * V700 + As a storm moves northward, it will likely encounter more southerly winds at 700mb, which is indicated by the positive correlation.
0.0705 * V400 + In the upper-level, winds will tend to have a stronger southerly component (positive correlation).
0.0116 * V300 + In the upper-level, winds will tend to have a stronger southerly component (positive correlation, albeit quite a weak one).
0.1107 * LandSeaTemp + This positive correlation is probably due to the fact that a storm moving northward is moving into relatively warmer waters near the Gulf as compared to the water to the south (and east).
-0.0197 * Latitude + This makes sense because it wouldn’t really matter what latitude the storm is already located at, hence the low correlation.
-0.0156 * Ushear + Although as a storm moves northward you would expect greater wind shear in general because the upper-level winds in the mid-latitudes are stronger than those in the low latitudes, the negative correlation could be due to the fact that hurricanes may generally be moving away from the subtropical jet (and hence lower wind shear) as they move northward.
-32.5421 Additional factor
Time taken to build model: 1.5 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.9644
Mean absolute error 2.2232
Root mean squared error 3.5247
Relative absolute error 23.2871 %
Root relative squared error 26.4312 %
Total Number of Instances 10200
Longitude
ChangeNLon = 0.1128 * ChangePLat + 0.6674 * ChangePLon + 0.0229 * RHPercent + 0.2334 * U925 - 0.1538 * U850 + 0.3081 * U700 + 0.2121 * U500 + 0.0523 * U400 + 0.2207 * U300 + 0.0532 * V925 – 0.156 * V850 - 0.2312 * V400 + 0.1058 * V300 + 0.08 * LandSeaTemp + 0.1426 * Latitude + 0.0213 * Longitude - 0.0406 * Ushear - 0.0528 * Vshear - 36.1682
ChangeNLon =
0.1128 * ChangePLat + When storms move east really fast, they also tend to move north.
0.6674 * ChangePLon + Persistence from the previous 3 hours is quite important.
0.0229 * RHPercent + Higher relative humidities are found away from the Gulf of Mexico, i.e. towards the east.
0.2334 * U925 + This makes sense because east-blowing winds move the storm east.
-0.1538 * U850 + This doesn’t make sense.
0.3081 * U700 + This makes sense because east-blowing winds move the storm east.
0.2121 * U500 + This makes sense because east-blowing winds move the storm east.
0.0523 * U400 + This makes sense because east-blowing winds move the storm east.
0.2207 * U300 + This makes sense because east-blowing winds move the storm east.
0.0532 * V925 + Low-level winds blowing northward move a storm eastward.
-0.156 * V850 + Mid-level winds blowing northward move a storm westward.
-0.2312 * V400 + Upper-level winds blowing northward move a storm westward.
0.1058 * V300 + The mid-latitude westerlies also have a strong northerly component.
0.08 * LandSeaTemp + Higher sea surface temperatures are west (in the Gulf of Mexico).
0.1426 * Latitude + Storms move eastward really fast at high latitudes because of the mid-latitude westerlies.
0.0213 * Longitude + If a storm gets really far east (like near the U.K.) it will move east a lot.
-0.0406 * Ushear + This doesn’t really make sense.
-0.0528 * Vshear + This doesn’t really make sense.
-36.1682 Additional factor
Time taken to build model: 0.17 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.9374
Mean absolute error 3.4582
Root mean squared error 7.871
Relative absolute error 19.9987 %
Root relative squared error 34.8312 %
Total Number of Instances 10200
Radius of Max Winds
ChangeNMaxWind = -0.7305 * ChangePInt + 0.0064 * ChangePLat + 0.0025 * ChangePLon - 0.3494 * ChangePMaxWind + 0.1531 * ChangeP34Wind + 0.0273 * U925 - 0.0237 * U850 - 0.0056 * U500 + 0.2433 * LandMask + 0.003 * Int + 0.0069 * Ushear - 0.2298
ChangeNMaxWind = This model will predict the forward change in the radius of maximum winds.
-0.7305 * ChangePInt + This makes sense because the radius of the max winds will decrease with an intensifying storm (a smaller eye wall due to conservation of angular momentum indicates a strong storm).
0.0064 * ChangePLat + This indicates that an increase in latitude will cause the radius to increase slightly.
0.0025 * ChangePLon + This indicates that an increase in latitude will cause the radius to increase slightly.
-0.3494 * ChangePMaxWind + This makes sense because the radius of the max winds will decrease with an intensifying storm, so you would want the radius to continue to shrink.
0.1531 * ChangeP34Wind + This indicates that an increase in the radius of 34 knots wind will correlate with an increase of radius winds, which makes sense.
0.0273 * U925 + This indicates that a 925mb wind toward the east will likely cause the radius to increase.
-0.0237 * U850 + This indicates that a 850mb wind toward the east will likely cause the radius to decrease.
-0.0056 * U500 + This indicates that a 500mb wind toward the east will likely cause the radius to decrease.
0.2433 * LandMask + This indicates that if the storm goes over land, the wind radius will increase, which makes sense because intensity is likely to decrease in that situation.
0.003 * Int + This indicates that a storm with a stronger intensity will have a larger radius of maximum wind.
0.0069 * Ushear + This indicates that a wind shear will cause the radius of max winds to increase slightly.
-0.2298 Additional factor
Time taken to build model: 2.08 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.328
Mean absolute error 1.1559
Root mean squared error 1.5361
Relative absolute error 94.0551 %
Root relative squared error 94.4637 %
Total Number of Instances 10200
Radius of 34kt Winds
ChangeN34Wind = 0.6497 * ChangePInt + 0.0303 * ChangePLat - 0.4792 * ChangePMaxWind + 0.0632 * ChangeP34Wind + 0.0234 * U925 - 0.0151 * U850 - 0.0084 * U400 + 0.0186 * V850 - 0.0235 * V700 - 0.429 * LandMask + 0.02 * LandSeaTemp - 0.0066 * Int - 5.3472
ChangeN34Wind =
0.6516 * ChangePInt + If the storm strengthened in the past 3 hours, the size of gale-force winds increases.
0.0297 * ChangePLat + Storms tend to grow larger in size as they move northward
-0.4801 * ChangePMaxWind + When the storm’s strongest winds narrow, the size of the gale force winds usually expands a lot.
0.0638 * ChangeP34Wind + If the storm’s gale force wind radius expanded in the past 3 hours, it will likely keep growing in the next 3 hours.
0.0341 * U925 + Low level winds blowing east increase storm size.
-0.0333 * U850 + Mid-level winds blowing east decrease storm size.
0.0209 * V850 + Low-mid level winds blowing north increase storm size.
-0.0252 * V700 + Mid level winds blowing north decrease storm size.
-0.4288 * LandMask + If a storm goes over land it loses its energy source and the wind field will shrink.
0.0226 * LandSeaTemp + Warmer temperatures promote growth and expansion.
-0.0064 * Int + Storms that get extremely strong (i.e. Cat 5) often see a slight decrease in their gale force wind radius.
-6.1348 Additional factor
Time taken to build model: 0.09 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.4392
Mean absolute error 1.9689
Root mean squared error 2.5964
Relative absolute error 89.415 %
Root relative squared error 89.8303 %
Total Number of Instances 10200
We found strong correlations for intensity, latitude, and longitude change which indicates that these models are fairly accurate. However, we received low correlations for changes in radius of max and 34kt winds. Linear regression gave us the strongest correlation for every variable except longitude, where a MultiLayer Perceptron analysis gave us a correlation coefficient of .951 and a least mean squared error of 7.0234. These findings match with our initial thoughts on the process, with changes in radius of winds being the most difficult to predict accurately but changes in intensity and track being more simple to predict. For a change over three hours, a linear regression provides the best match because the storm is so large that it will not change dramatically, but will rely primarily on persistence.
Regression by Discretization allows for the use of statistical models that require name classifiers. Such statistical models include the 'J48,' the 'Random Trees' and the 'Random Forest.' The random forest works by building a consensus out of M random trees ("seeing the forest out of the trees"). A random tree works by choosing k random features (variables) out of the list of input features. Then, the tree breaks the predicted variable into n bins.
For the forward predictions of three hour intensity change, three hour latitude change, and three hour longitude change, the Random Forest model was used with parameters of 12 random trees, with each tree considering 4 random variables, and the predicted varible broken into 7 bins. Furthermore, a 10-fold cross-validation was used.
For all of these predictedands, the following variables were always included: the change in the storm intensity, latitude, longitude, maximum wind radius, and in the gale force wind radius over the previous 3 hours, the relative humidity, the u and v winds at 925, 850, 700, 500, 400, and 300 mb levels, the land mask, the land/sea temperature, latitude, longitude, intensity, u shear, and v shear.
For the forward 3 hour change in storm intensity, the forward changes in latitude and longitude were not included. For the forward 3 hour change in latitude, the forward changes in intensity and longitude were not included. Likewise, for the forward 3 hour change in longitude, the forward changes in intensity and latitude were not included.
| Method | Correlation Coefficient | Root Mean Square error |
Intensity |
||
| IBK | 0.6442 | 0.8981 |
| J48 | 0.6372 | 0.872 |
| LeastMedSq | 0.7171 | 0.7728 |
| Linear Regression | 0.7315 | 0.7522 |
| MultiLayer | 0.641 | 0.9137 |
Longitude |
||
| IBK | 0.9086 | 9.5605 |
| J48 | 0.9107 | 9.3336 |
| LeastMedSq | 0.9331 | 8.1965 |
| Linear Regression | 0.9374 | 7.871 |
| MultiLayer | 0.951 | 7.0234 |
Latitude |
||
| IBK | 0.9006 | 5.8848 |
| J48 | 0.9357 | 4.7006 |
| LeastMedSq | 0.9634 | 3.6138 |
| Linear Regression | 0.9645 | 3.5211 |
| MultiLayer | 0.9508 | 4.1495 |
Radii Max Winds |
||
| IBK | 0.0121 | 2.254 |
| J48 | 0.114 | 1.9722 |
| LeastMedSq | 0.3181 | 1.5418 |
| Linear Regression | 0.382 | 1.5361 |
| MultiLayer | 0.192 | 1.7681 |
Radii 34kt Winds |
||
| IBK | 0.14321 | 3.6967 |
| J48 | 0.2125 | 3.3189 |
| LeastMedSq | 0.4349 | 2.6029 |
| Linear Regression | 0.4375 | 2.5988 |
| MultiLayer | 0.3189 | 2.9645 |