DAVIAN Lab. Deep Learning Winter Study (2021)

Writer: Min-Jung Kim

Information

Title: (cs229) Lecture 4 : Perceptron. Exponential Family. Generalized Linear Models.
Link: http://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes1.pdf
http://cs229.stanford.edu/livenotes2020spring/cs229-livenotes-lecture4.pdf
Keywords: Perceptron, Exponential Family, Generalized Linear Model, Softmax Regression (Multi-class classification)

Perceptron

Perceptron is somewhat similar to sigmoid function but different.
It is hard version of sigmoid function.

Logistic Regression with Sigmoid function
Perceptron
Geometrical Interpretation of Perceptron theta update.

Perceptron is not something that's widely used in practice.
We study it mostly for historical reasons.
It is not used because it does not have a probabilistic interpretation of what 's happening.
Also it could never classify xor

Exponential Family

It is class of probability distributions, whos PDF can be written in the form

$http://latex.codecogs.com/gif.latex?\dpi{110} p(y;\eta)=b(y)exp[\eta^{T}T(y)-a(\eta)]\textbf{}$ => integrates to 1

y : data (output)
$http://latex.codecogs.com/gif.latex?\dpi{110} \eta$ : natural parameter (parameter of distribution)
b(y) : base measure
T(y) : sufficient statistic. In this lecture, T(y) = y
$http://latex.codecogs.com/gif.latex?\dpi{110} a(\eta)$ : log partition, normalizing constant

GLM (Generalized Linear Model)

We can build a lot of powerful models by choosing nappropriate E.F and plugging it onto a linear model.

Assumptions / Design Choices

a) $http://latex.codecogs.com/gif.latex?\dpi{110} y|x, \theta$ ~ Exponential Family
Depending on the problem that you have, you can choose any member of E.F as parameterized by $http://latex.codecogs.com/gif.latex?\dpi{110} \eta$
b) $http://latex.codecogs.com/gif.latex?\dpi{110} \eta = \theta^{T}x$
c) Test Time Output = $http://latex.codecogs.com/gif.latex?\dpi{110} E[y|x;\theta]$

GLM Training

No matter what kind of GLM you are doing, no matter which choice of distribution that you make,
the learning update rule is the same.

Learning Update Rule
$http://latex.codecogs.com/gif.latex?\dpi{110} \theta_{j}:=\theta_{j}+\alpha(y^{(i)}-h_{\theta}(x^{(i)}))x_{j}^{(i)}$

Terminology

$http://latex.codecogs.com/gif.latex?\dpi{110} \mu=E[y;\eta]=g(\eta)=\frac{\partial}{\partial \eta}a(\eta)$ = canonical response function
$http://latex.codecogs.com/gif.latex?\dpi{110} \eta=g^{-1}(\mu)$ = canonical link function

3 Parameterization

Softmax Regression

Yet another member of GLM family. Usually, hypothesis equals probability or scalar, while softmax outputs prob. distribution.
$http://latex.codecogs.com/gif.latex?\dpi{110} \frac{e^{\theta_{i}^{T}x}}{\sum_{i\in {class1, class2, ...}}^{}e^{\theta_{i}^{T}x}}$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week2_perceptron_exponentialfamily_softmax.md

week2_perceptron_exponentialfamily_softmax.md

DAVIAN Lab. Deep Learning Winter Study (2021)

Information

Perceptron

Exponential Family

Ex1 : Bernoulli Distribution

Ex2 : Gaussian Distribution (with fixed variance)

Exponential Family Properties

GLM (Generalized Linear Model)

Assumptions / Design Choices

GLM Training

Terminology

3 Parameterization

Softmax Regression

Files

week2_perceptron_exponentialfamily_softmax.md

Latest commit

History

week2_perceptron_exponentialfamily_softmax.md

File metadata and controls

DAVIAN Lab. Deep Learning Winter Study (2021)

Information

Perceptron

Exponential Family

Ex1 : Bernoulli Distribution

Ex2 : Gaussian Distribution (with fixed variance)

Exponential Family Properties

GLM (Generalized Linear Model)

Assumptions / Design Choices

GLM Training

Terminology

3 Parameterization

Softmax Regression