ml-N
  • Introduction
  • Theory
    • Reinforcement Learning
      • Preface
      • Basic Conceptions
      • Multi-armed Bandits
      • Finite Markov Decision Processes
      • Dynamic Programming
    • Conception
    • IO
    • Architecture
    • Awesome Ideas
  • Tools
    • Caffe
      • Summary
      • Tips
      • Issues
    • Tensorflow
      • Tips
  • Applications
    • Object Detection
Powered by GitBook
On this page
  • Skip Connection
  • Connections between ResNet, DenseNet and Higher Order RNN
  1. Theory

Architecture

This chapter collects some tricky design in various types of neural networks.

Skip Connection

It is first raised in ResNet as a .

Connections between ResNet, DenseNet and Higher Order RNN

Residual networks essentially belong to the family of densely connected networks except that their connections are shared across steps.

Prove: (The following prove is provided by Dual Path Networks by Pengcheng Yun)

We use hth^tht to denote the hidden state of the recurrent neural network at the t−tht-tht−th step and use kkk as the index of the current step. Let xtx^txt denotes the input at t−tht-tht−th step, h0=x0h^0 = x^0h0=x0. For each step, ftk(⋅)f^k_t(·)ftk​(⋅) refers to the feature extracting function which takes the hidden state as input and outputs the extracted information. The gk(⋅)g^k(·)gk(⋅) denotes a transformation function that transforms the gathered information to current hidden state:

hk=gk[∑t=0k−1ftk(ht)]h^k = g^k[\sum_{t=0}^{k-1}f_t^k(h^t)]hk=gk[t=0∑k−1​ftk​(ht)]

For HORNNs, weights are shared across steps, i.e.∀t,k,ftk(⋅)≡ft(⋅)i.e. \forall t,k,f^k_t(·) ≡ f_t(·)i.e.∀t,k,ftk​(⋅)≡ft​(⋅) and ∀k,gk(⋅)≡g(⋅)\forall k, g^k(·) ≡ g(·)∀k,gk(⋅)≡g(⋅). For the densely connected networks, each step (micro-block) has its own parameter, which means ftk(⋅)f^k_t (·)ftk​(⋅) and gk(⋅)g^k(·)gk(⋅) are not shared.

rk≜∑t=1k−1ft(ht)=rk−1+fk−1(hk−1)hk=gk(rk)⟹rk=rk−1+fk−1(hk−1)=rk−1+fk−1(gk−1(rk−1))=rk−1+ϕk−1(rk−1)r^k \triangleq \sum_{t=1}^{k-1}f_t(h^t) = r^{k-1} + f^{k-1}(h^{k-1}) \\ h^k = g^k(r^k) \\ \Longrightarrow r^k = r^{k-1} + f_{k-1}(h^{k-1}) = r^{k-1}+f^{k-1}(g^{k-1}(r^{k-1})) = r^{k-1} + \phi^{k-1}(r^{k-1})rk≜t=1∑k−1​ft​(ht)=rk−1+fk−1(hk−1)hk=gk(rk)⟹rk=rk−1+fk−1​(hk−1)=rk−1+fk−1(gk−1(rk−1))=rk−1+ϕk−1(rk−1)

Specifically, when ∀k,ϕk(⋅)=ϕ(⋅)\forall k, \phi^k(·) = \phi(·)∀k,ϕk(⋅)=ϕ(⋅), the above equation degenerates to an RNN; when none of ϕk(⋅)\phi ^k(·)ϕk(⋅) is shared and xk=0,k>1x^k = 0,k > 1xk=0,k>1, it produces a residual network.

PreviousIONextAwesome Ideas

Last updated 6 years ago