CS480/680 Intro to Machine Learning
Lecture 12
Gausain process
- infinite dimentional gaussian distribution
Lecture 16
Convolution NN
- a rule of thumb: to have many layers with smaller filters is better than having one big filter, as going deep captures better features and also uses fewer parameters
Residual Networks
- even after using Relu, NN can still suffer from gradient vanishing
- the idea in to add skip connections so that we can create shorter paths
Lecture 18
LSTM vs GRU vs Attention
- LSTM: 3 gates, one for the cell state, one for the input, one for the output
- GRU: only two states, one for output, and one for taking weighted probablitiy for the contribution of the input and the hidden state
- takes less parameters
- Attention: at every step of producing the output, create a new context vector that gives more attention to the importat input tokens for this output token
Lecture 20
Autoencoder
takes different input and generates the same output