| Paper | Presenter | Week |
|---|---|---|
|
Different activation function and optimizer: 1. Maxout Networks 2. ADADELTA: An Adaptive Learning Rate Method |
||
|
Initializers: 1. Understanding the difficulty of training deep feedforward neural networks 2. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification |
Ali Alsetri | 12/1 |
|
Generalization errors of optimizer: 1. The Marginal Value of Adaptive Gradient Methods in Machine Learning 2. When do adaptive optimizers fail to generalize? |
||
|
Different local minimum and generalization: 1. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima 2. Sharp Minima Can Generalize For Deep Nets |
Kotaro Kajita | 11/20 |
|
Adam optimizer: Adam: A Method for Stochastic Optimization |
Alex Emmons | 11/29 |
| CNN Visualization: Visualizing and Understanding Convolutional Networks |
Joshua Peterson | 11/27 |