ml-N
  • Introduction
  • Theory
    • Reinforcement Learning
      • Preface
      • Basic Conceptions
      • Multi-armed Bandits
      • Finite Markov Decision Processes
      • Dynamic Programming
    • Conception
    • IO
    • Architecture
    • Awesome Ideas
  • Tools
    • Caffe
      • Summary
      • Tips
      • Issues
    • Tensorflow
      • Tips
  • Applications
    • Object Detection
Powered by GitBook
On this page
  1. Theory

Architecture

PreviousIONextAwesome Ideas

Last updated 7 years ago

CtrlK
  • Skip Connection
  • Connections between ResNet, DenseNet and Higher Order RNN

This chapter collects some tricky design in various types of neural networks.

Skip Connection

It is first raised in ResNet as a .

Connections between ResNet, DenseNet and Higher Order RNN

Residual networks essentially belong to the family of densely connected networks except that their connections are shared across steps.

Prove: (The following prove is provided by Dual Path Networks by Pengcheng Yun)

We use hth^tht to denote the hidden state of the recurrent neural network at the t−tht-tht−th step and use kkk as the index of the current step. Let xtx^txt denotes the input at t−tht-tht−th step, h0=x0h^0 = x^0h0=x0. For each step, ftk(⋅)f^k_t(·)ftk​(⋅) refers to the feature extracting function which takes the hidden state as input and outputs the extracted information. The gk(⋅)g^k(·)gk(⋅) denotes a transformation function that transforms the gathered information to current hidden state:

hk=gk[∑t=0k−1ftk(ht)]h^k = g^k[\sum_{t=0}^{k-1}f_t^k(h^t)]hk=gk[t=0∑k−1​ftk​(ht)]

For HORNNs, weights are shared across steps, i.e.∀t,k,ftk(⋅)≡ft(⋅)i.e. \forall t,k,f^k_t(·) ≡ f_t(·)i.e.∀t,k,ftk​(⋅)≡ft​(⋅) and ∀k,gk(⋅)≡g(⋅)\forall k, g^k(·) ≡ g(·)∀k,gk(⋅)≡g(⋅). For the densely connected networks, each step (micro-block) has its own parameter, which means ftk(⋅)f^k_t (·)ftk​(⋅) and gk(⋅)g^k(·)gk(⋅) are not shared.

rk≜∑t=1k−1ft(ht)=rk−1+fk−1(hk−1)hk=gk(rk)⟹rk=rk−1+fk−1(hk−1)=rk−1+fk−1(gk−1(rk−1))=rk−1+ϕk−1(rk−1)r^k \triangleq \sum_{t=1}^{k-1}f_t(h^t) = r^{k-1} + f^{k-1}(h^{k-1}) \\ h^k = g^k(r^k) \\ \Longrightarrow r^k = r^{k-1} + f_{k-1}(h^{k-1}) = r^{k-1}+f^{k-1}(g^{k-1}(r^{k-1})) = r^{k-1} + \phi^{k-1}(r^{k-1})rk≜t=1∑k−1​ft​(ht)=rk−1+fk−1(hk−1)hk=gk(rk)⟹rk=rk−1+fk−1​(hk−1)=rk−1+fk−1(gk−1(rk−1))=rk−1+ϕk−1(rk−1)

Specifically, when ∀k,ϕk(⋅)=ϕ(⋅)\forall k, \phi^k(·) = \phi(·)∀k,ϕk(⋅)=ϕ(⋅), the above equation degenerates to an RNN; when none of ϕk(⋅)\phi ^k(·)ϕk(⋅) is shared and xk=0,k>1x^k = 0,k > 1xk=0,k>1, it produces a residual network.