deep learning tutorial 学习笔记

时间:2022-12-14 22:43:59

Deep learning Tutorial

cs231n 讲义,作业

why deep is better?

deep learning tutorial 学习笔记deep learning tutorial 学习笔记deep learning tutorial 学习笔记deep learning tutorial 学习笔记

Choosing Proper Loss

deep learning tutorial 学习笔记deep learning tutorial 学习笔记

dropout

  • dropout is kind of ensembel

deep learning tutorial 学习笔记deep learning tutorial 学习笔记

deep learning tutorial 学习笔记deep learning tutorial 学习笔记

deep learning tutorial 学习笔记deep learning tutorial 学习笔记

 

 Why CNN for Image

deep learning tutorial 学习笔记

卷积核处理后的尺寸:Feature Map 的尺寸等于(input_size+ 2 * padding_size-filter_size) / stride +1

cross-validation strategy

coarse -> fine cross-validation in stages

  • first stage: only a few epochs to get rough idea of what params work

  • second stage: longer running time,finer search.. repeat as necessary

  • tip for detecting explosions in the solver: if the cost is ever > 3*original cost ,break out early

relu

deep learning tutorial 学习笔记

 adagrad, adam

 deep learning tutorial 学习笔记deep learning tutorial 学习笔记

 

# Adam
m = beta1*m + (1-beta1)*dx      # update first moment
v = beta2*v + (1-beta2)*(dx**2) # update second moment
x += -learning_rate * m / (np.sqrt(v) + 1e-7) 

# Adam
m,v =  # ... initialize caches to zeros
for t in  xrange(0, big_number):
    dx = # ... evaluate gradient
    m = beta1 * m + (1-beta1) * dx
    v = beta2 * v + (1-beta2) * (dx**2)
    m /= 1-beta1**t     # correct bias
    v /= 1-beta2**t     # correct bias
    x += -learning_rate * m / ( np.sqrt(v) + 1e-7)

# Adagrad update
cache += dx**2
x += -learning_rate * dx / ( np.sqrt(cache) + 1e-7 )

# RMSProp
cache = decay_rate * cache + ( 1 - decay_rate) * (dx**2)
x += -learning_rate * dx / ( np.sqrt(cache) + 1e-7 )

 

batch normalization

deep learning tutorial 学习笔记

RNN

deep learning tutorial 学习笔记

deep learning tutorial 学习笔记