Architecture
This chapter collects some tricky design in various types of neural networks.
Skip Connection
It is first raised in ResNet as a .
Connections between ResNet, DenseNet and Higher Order RNN
Residual networks essentially belong to the family of densely connected networks except that their connections are shared across steps.
Prove: (The following prove is provided by Dual Path Networks by Pengcheng Yun)
We use to denote the hidden state of the recurrent neural network at the step and use as the index of the current step. Let denotes the input at step, . For each step, refers to the feature extracting function which takes the hidden state as input and outputs the extracted information. The denotes a transformation function that transforms the gathered information to current hidden state:
For HORNNs, weights are shared across steps, and . For the densely connected networks, each step (micro-block) has its own parameter, which means and are not shared.
Specifically, when , the above equation degenerates to an RNN; when none of is shared and , it produces a residual network.
Last updated