Regression - Supervised Learning Algorithm #

Loss/Cost functions measure the correctness of the model prediction. Various methods are used depending on the requirement.

Fit using residual least squares a.k.a sum of squares method i.e. the line that reduces the distance between the data points
Using the residuals \(R^{2}\) is calcuated to compare simple and complex models (# of attributes used for prediction)
Calcualte p-value for the calculated \(R^{2}\)

\(R^{2}\) #

Percentage of variation explained between two variables
Ranges between 0 to 1. Higher the the value, better the model
R is Co-efficient if correlation, which describe the relation between two variables say, x & y.
\(R^{2}\) is the co-efficient of determiniation.
Does not indicate the direction of corelation since squared numbers are always positive

\[R^{2} = { variation(mean) - variation(best-fit-line) \above{1pt} variation(mean) }\]

Hypothesis #

\[h(x) = \theta_{0} + \theta_{1}x\] \[h(x) = \theta^{T}x\]

Cost function #

\[J(\theta_{0}, \theta_{1}) = {1 \above{1pt} 2m} \sum_{\substack{i = 0}}^m(\hat y_{i} - y_{i})^{2}\]

In terms of Theta, \[J(\theta) = {1 \above{1pt} 2m} \sum_{\substack{i = 0}}^m(\hat y_{i} - y_{i})^{2} = {1 \above{1pt} 2m} \sum_{\substack{i = 0}}^m(h(x_{i}) - y_{i})^{2}\]

Mean Absolute Error (or) L1 Loss #

Mean of sum of absolute differences between actual and predicted values. Robust and not affected by outliers.

\[ mae = {1 \above{1pt} m} \sum_{\substack{i = 0}}^m \vert h(x_{i}) - y_{i} \vert\]

Mean Square Error (or) L2 Loss #

The mean of squared differences between the actual target and the prediction. Highly sensitive to outliers. W/O outliers, L2 performs better than L1.

\[ mse = {1 \above{1pt} m} \sum_{\substack{i = 0}}^m(h(x_{i}) - y_{i})^{2}\]

Huber Loss #

A combination of L1 and L2 loss functions. When the error with small compute L1 and when it is higher compute L2 loss. The hyper parameter \(\delta\) determines when to switch to quadratic loss. The parameter \(\delta\) is determined through iterative process.

\[L_{\delta}(y, f(x)) = \begin{dcases} {1 \above{1pt} 2}(\hat y_{i} - y_{i})^{2} &\text{for } \vert \hat y_{i} - y_{i} \vert \leqq {\delta} \\ {\delta \vert \hat y_{i} - y_{i} \vert} - {1 \above{1pt} 2} \delta^2 &\text{otherwise } \end{dcases}\]

Log-Cosh Loss #

Log-Cosh is the log of hyperbolic cosine of the prediction error.

\[L(y, \hat y) = \sum_{\substack{i = 0}}^m log(cosh(\hat y - y))\]

Quantile loss #

Truncated loss #

This functions trims the outliers to provide a more accurate prediction.

\[L(e) = \begin{dcases} e^{2} &\text{} \forall \{ e; Q_{\substack{0.025}}(e) < e < Q_{\substack{0.975}}(e) \} \\ constant &\text{otherwise} \end{dcases}\]

References #

External-I

TODO #

http://iacmc.zu.edu.jo/ar/images/stories/IACMC2016/39.pdf