Classification Model Evaluation

Evaluation Metrics - Classification Accuracy

Evaluation of classification model #

Generative model Naive Bayes models the joint distribution of the feature X and target Y, and then predicts the posterior probability given as P(y|x). e.g. Naive Bayes, Bayesian Networks and Hidden Markov models

Discriminative model Logistic regression directly models the posterior probability of P(y|x) by learning the input to output mapping by minimising the error. e.g. Logistic Regression, Support Vector Machine and Conditional Random Fields

True Positive Rate (or) Sensitivity (or) Recall #

  • Measure how good a model is at predicting positive class when the actual outcome is positive.
\[TPR = {TP \above{1pt} (TP + FN)}\]

False Positive Rate (or) Inverse Specificity (or) False alarm rate #

  • Measure how ofter positive cases are prodicted when the outcome is negative.
\[FPR = {1 - Specificity} \\ FPR = {FP \above{1pt} (FP + TN)}\]

Specificity #

\[Specificity = {TN \above{1pt} (TN + FP)}\]

Precision #

  • Precision and Recall do not use TN
\[Positive Predictive Power (or) Precision = {TP \above{1pt} (TP + FP)}\]

F-measure (or) F1-Score #

  • Harmonic mean of precision and recall

Akaike Information Criterion (AIC) #

  • Estimates quality of models for the same dataset relative to each other
  • Used as a means for selecting correct model
  • Lower the score better the model
  • Usually used if less test data; Train on the entire dataset and use the AIC to validate model performance

Bayes Factor #

Precision Recall curves #

  • Used in binary classification
  • Precision on y-axis and recall and x-axis for different thresholds
  • Depicts trade-offs between TPR and positive predictive value using different probability thresholds
  • Appropriate for imbalanced datasets

Receiver Operating Characteristic #

  • Metric used to evaulte classifier output quality.
  • Depicts trade-offs between TPR and FPR using different probability thresholds
  • True +ve rate on the y-axis and False +ve rate on the x-axis. Top right corner is the ideal value.
  • Appropriate for balanced datasets
  • Larger AUC (Area under the curve) is better
  • Typically used in binary classifier, to use in multilabel classifier it is required to binarize the output.

Micro-averaging #

  • ROC alternate for mutilabel classifiers

References #