Logistic Regression

Logistic Regression #

  • Eager learners
  • Logistic regression is a classification algorithm, that uses Sigmoid function.
  • S-shaped curve varying between 0 and 1 and asymtotes at the tails
  • Unlike linear regression the out vairable is discrete
  • Logictic function takes any value and maps it between the range of 0 and 1
  • Can work on both continuous and discrete attributes
  • When using multiple attributes we cannot compare models as such. Hence we remove/add attributes and check if the variableseffect on prediction is greater than zero. (Wald’s Test)
  • The non helping attribute is referred as “totes useless”
  • Unlike linear regression the residuals concept doesn’t apply here. Hence max likelihood is used to fit the curve
  • Special type of Generalized Linear Models (GLM)
  • Though presented as a logistic function, the coefficients are determined using liner function by converting to logit function.

Representation #

\[h({\theta}) = {1 \above{1pt} (1 + e^-{\theta^{T}x})} \qquad \text{e - natural logarithm base}\] \[y \isin \lbrace 0, 1 \rbrace \qquad 0: \text{Negative case; 1: Positive case }\]

Interpretation #

The output of the hypothesis function is interpreted as the probability of the target bring positive given features x parameterized by \(\theta\)

\[P(y=1|x;\theta) + P(y=0|x;\theta) = 1 \\ P(y=1|x;\theta) = 1 - P(y=0|x;\theta)\]

Cost Function #

\[J({\theta}) = {1 \above{1pt} m} \sum_{\substack{i = 0}}^m Cost(h_{\theta}(x^i), y^i) \qquad \begin{dcases} {Cost(h_{\theta}(x^i), y^i) = −log(h_{\theta}(x))} &\text{if y = 1} \\ {Cost(h_{\theta}(x^i), y^i) = −log(1−h_{\theta}(x)} &\text{if y = 0} \end{dcases}\]

Cost function can be simplified as below; Substution y = 0 or 1 gives the same function above, \[Cost(h_{\theta}(x^i), y^i) = -y * log(h_{\theta}(x)) - (1 - y) * log(1 − h_{\theta}(x)\]

\[J({\theta}) = - {1 \above{1pt} m} \sum_{\substack{i = 0}}^m y^i log(h_{\theta}(x)) + (1 - y^i) log(1 - h_{\theta}(x))\]

Maximum likelihood estimation

Interpretation #

  • Logistic regression models probability of the positive class.

Logit #

  • \(logit^{-1}(x) = logistic(x)\)
  • Find the y-intercept
  • Find the standard-deviation
  • z-value = y-intercept / standard-deviation; # of std deviations the y-intercept is away from mean of the normal curve
  • Use Wald’s test to determine statistic significance.If the z-value is < 2 std dev, then insignificant

Optimization Algorithms #

  • Gradient Descedent
  • Conjugate Gradient
  • BFGS
  • L-BFGS

Cross Entropy #

Unlike linear regression the prediction function(sigmoid function), the cost function has many local and global minima. Hence gradient descent cannot be used. Hence Cross entropy is used; also known as Loss function.

References #