Predictive Analysis

We draw the ROC curve of five different methods of modeling in the same graph. The ROC curve was calculated using the mean of result in five folds cross-validation. And this plot is actually a summary of each individual plot listed before for each method. The closer the curve to the upper-left corner of the plot, the better the model performed. By comparison, Random Forest classifier has the best performance among all the methods. The performance of Nearest Neighbor Classifier and Naïve Bayes Classifier are in the middle level, while SVM and Decision Tree has relatively poor performance among those methods.
Listed below is a chart of accuracy scores, precision and recall rate of each model tested using the full data set. The performance of each model varies a lot. Form the chart, the accuracy rates are pretty high for all method except the Naïve Bayes Classifier. However, the accuracy score could be deceptive, the precision and recall rate will present the really performance of each model. Taken into consideration of all measurement of performance, random forest would be the best model with higher accuracy rate and relatively pleasant precision and recall rate.
