Deep Learning Papers Summarization

A Summary of DL papers

04-22-2022 458 words 3 minutes

Contents

In NN, the training process, has 3 bottle-necks
- forward lock: you need to calculate teh output of the previous layer before you can can go into next layer in forward pass
- backward pass: the same, but for backward propagation
- weights lock: you can’t update weights unless you do for weights in next layer
the paper trying to unlock these bootle-necks by decoupling each layer, to be sufficient alone
it does that by introducing, a Synthetic Gradient Model, that can predict the gradient for the current layer, without waiting for the gradient of the next layer
this was we can calculate gradient and update weights as soon as we calculate the activation of the current layer

can be just a simple NN that is trained to output the gradient of the layer
it can be trained using the true gradient, or even the synthetic gradient of the next layer
it’s important that the last layer computes the true gradient, as in the end we must have a ground truth to can calculate a true loss, and the NN would actually train
we can have also synthetic model for forward pass, that works with the same idea

We are in the Era of Big Models
Model generalization is hard, models trained on certain data domain, doesn’t scare to other
Datasets creation, and high research tasks, made it hard for small companies to train task-specific models
Big models solve thees issues.

a new way to represent data
we represent knowledge as knowledge graphs
KG consists of: Instances, Relation, Concept, and Values
KG can be created using : experts, wiki-based knowledge graphs, or extracted from unstructured texts

most of the known KGs has many fields empty, and there’s a going research in how to deal with that and fill the gaps.
some methods try to do that using intra-graph knowledge augmentation or with inter-graph.

Forward diffusion process: gradually keep adding noise to the original image till it’s destroyed

the main task is to reverse the noising procedure, so then we can learn the underlying data distribution, then we can generate images from it
instead of calculating the steps of the forward diffusion process sequently, we can combine all the steps in one step, by sampling from a distributuion which have mean of the product of all means in each step

$\begin{aligned} q(x_t | x_0) = x_t \sim \mathcal{N}( \sqrt{\bar \alpha} x_0 , (1 - \bar \alpha ) \mathcal{I})\end{aligned}$