Evaluation Metrics - Classification Accuracy

Evaluation of classification model #

Generative model Naive Bayes models the joint distribution of the feature X and target Y, and then predicts the posterior probability given as P(y|x). e.g. Naive Bayes, Bayesian Networks and Hidden Markov models

Discriminative model Logistic regression directly models the posterior probability of P(y|x) by learning the input to output mapping by minimising the error. e.g. Logistic Regression, Support Vector Machine and Conditional Random Fields

True Positive Rate (or) Sensitivity (or) Recall #

Measure how good a model is at predicting positive class when the actual outcome is positive.

\[TPR = {TP \above{1pt} (TP + FN)}\]

False Positive Rate (or) Inverse Specificity (or) False alarm rate #

Measure how ofter positive cases are prodicted when the outcome is negative.

\[FPR = {1 - Specificity} \\ FPR = {FP \above{1pt} (FP + TN)}\]

Specificity #

\[Specificity = {TN \above{1pt} (TN + FP)}\]

Precision #

Precision and Recall do not use TN

\[Positive Predictive Power (or) Precision = {TP \above{1pt} (TP + FP)}\]

F-measure (or) F1-Score #

Harmonic mean of precision and recall

Akaike Information Criterion (AIC) #

Estimates quality of models for the same dataset relative to each other
Used as a means for selecting correct model
Lower the score better the model
Usually used if less test data; Train on the entire dataset and use the AIC to validate model performance

Bayes Factor #

Precision Recall curves #

Used in binary classification
Precision on y-axis and recall and x-axis for different thresholds
Depicts trade-offs between TPR and positive predictive value using different probability thresholds
Appropriate for imbalanced datasets

Receiver Operating Characteristic #

Metric used to evaulte classifier output quality.
Depicts trade-offs between TPR and FPR using different probability thresholds
True +ve rate on the y-axis and False +ve rate on the x-axis. Top right corner is the ideal value.
Appropriate for balanced datasets
Larger AUC (Area under the curve) is better
Typically used in binary classifier, to use in multilabel classifier it is required to binarize the output.

Micro-averaging #

ROC alternate for mutilabel classifiers