Architecture
This chapter collects some tricky design in various types of neural networks.
Skip Connection
It is first raised in ResNet as a .
Connections between ResNet, DenseNet and Higher Order RNN
Residual networks essentially belong to the family of densely connected networks except that their connections are shared across steps.
Prove: (The following prove is provided by Dual Path Networks by Pengcheng Yun)
We use ht to denote the hidden state of the recurrent neural network at the t−th step and use k as the index of the current step. Let xt denotes the input at t−th step, h0=x0. For each step, ftk(⋅) refers to the feature extracting function which takes the hidden state as input and outputs the extracted information. The gk(⋅) denotes a transformation function that transforms the gathered information to current hidden state:
For HORNNs, weights are shared across steps, i.e.∀t,k,ftk(⋅)≡ft(⋅) and ∀k,gk(⋅)≡g(⋅). For the densely connected networks, each step (micro-block) has its own parameter, which means ftk(⋅) and gk(⋅) are not shared.
Specifically, when ∀k,ϕk(⋅)=ϕ(⋅), the above equation degenerates to an RNN; when none of ϕk(⋅) is shared and xk=0,k>1, it produces a residual network.
Last updated