在第一個(gè)基于cnn的架構(gòu)(AlexNet)贏得ImageNet 2012比賽之后,每個(gè)隨后的獲勝架構(gòu)都在深度神經(jīng)網(wǎng)絡(luò)中使用更多的層來降低錯(cuò)誤率。這適用于較少的層數(shù),但當(dāng)我們?cè)黾訉訑?shù)時(shí),深度學(xué)習(xí)中會(huì)出現(xiàn)一個(gè)常見的問題,稱為消失/爆炸梯度。這會(huì)導(dǎo)致梯度變?yōu)?或太大。因此,當(dāng)我們?cè)黾訉訑?shù)時(shí),訓(xùn)練和測(cè)試錯(cuò)誤率也會(huì)增加。
在上圖中,我們可以觀察到56層的CNN在訓(xùn)練和測(cè)試數(shù)據(jù)集上的錯(cuò)誤率都高于20層的CNN架構(gòu)。通過對(duì)錯(cuò)誤率的進(jìn)一步分析,得出錯(cuò)誤率是由梯度消失/爆炸引起的結(jié)論。
ResNet于2015年由微軟研究院的研究人員提出,引入了一種名為殘余網(wǎng)絡(luò)的新架構(gòu)。
Residual Networks ResNet– Deep Learning
- 1、殘差網(wǎng)路
- 2、網(wǎng)絡(luò)架構(gòu)
- 3、代碼運(yùn)行
- 4、結(jié)果與總結(jié)
1、殘差網(wǎng)路
為了解決梯度消失/爆炸的問題,該架構(gòu)引入了殘差塊的概念。在這個(gè)網(wǎng)絡(luò)中,我們使用一種稱為跳過連接的技術(shù)。跳過連接通過跳過中間的一些層將一個(gè)層的激活連接到其他層。這就形成了一個(gè)殘塊。通過將這些剩余的塊堆疊在一起形成Resnets。
這個(gè)網(wǎng)絡(luò)背后的方法不是層學(xué)習(xí)底層映射,而是允許網(wǎng)絡(luò)擬合殘差映射。所以我們不用H(x)初始映射,讓網(wǎng)絡(luò)適合。
F(x) := H(x) - x which gives H(x) := F(x) + x.
添加這種類型的跳過連接的優(yōu)點(diǎn)是,如果任何層損害了體系結(jié)構(gòu)的性能,那么將通過正則化跳過它。因此,這可以訓(xùn)練一個(gè)非常深的神經(jīng)網(wǎng)絡(luò),而不會(huì)出現(xiàn)梯度消失/爆炸引起的問題。本文作者在CIFAR-10數(shù)據(jù)集的100-1000層上進(jìn)行了實(shí)驗(yàn)。
還有一種類似的方法叫做“高速公路網(wǎng)”,這些網(wǎng)絡(luò)也采用跳線連接。與LSTM類似,這些跳過連接也使用參數(shù)門。這些門決定有多少信息通過跳過連接。然而,這種體系結(jié)構(gòu)并沒有提供比ResNet體系結(jié)構(gòu)更好的準(zhǔn)確性。
2、網(wǎng)絡(luò)架構(gòu)
該網(wǎng)絡(luò)采用受VGG-19啟發(fā)的34層平面網(wǎng)絡(luò)架構(gòu),并增加了快捷連接。然后,這些快捷連接將架構(gòu)轉(zhuǎn)換為剩余網(wǎng)絡(luò)。
3、代碼運(yùn)行
使用Tensorflow和Keras API,我們可以從頭開始設(shè)計(jì)ResNet架構(gòu)(包括殘塊)。下面是不同的ResNet架構(gòu)的實(shí)現(xiàn)。對(duì)于這個(gè)實(shí)現(xiàn),我們使用CIFAR-10數(shù)據(jù)集。該數(shù)據(jù)集包含10個(gè)不同類別(飛機(jī)、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車)等的60,000張32×32彩色圖像。該數(shù)據(jù)集可以通過keras進(jìn)行評(píng)估。datasets API函數(shù)。
第1步:首先,我們導(dǎo)入keras模塊及其api。這些api有助于構(gòu)建ResNet模型的體系結(jié)構(gòu)。
代碼:導(dǎo)入庫
# Import Keras modules and its important APIs
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
第2步:現(xiàn)在,我們?cè)O(shè)置ResNet架構(gòu)所需的不同超參數(shù)。我們還對(duì)數(shù)據(jù)集做了一些預(yù)處理,為訓(xùn)練做準(zhǔn)備。
代碼:設(shè)置訓(xùn)練超參數(shù)
# Setting Training Hyperparameters
batch_size = 32 # original ResNet paper uses batch_size = 128 for training
epochs = 200
data_augmentation = True
num_classes = 10
# Data Preprocessing
subtract_pixel_mean = True
n = 3
# Select ResNet Version
version = 1
# Computed depth of
if version == 1:
depth = n * 6 + 2
elif version == 2:
depth = n * 9 + 2
# Model name, depth and version
model_type = 'ResNet % dv % d' % (depth, version)
# Load the CIFAR-10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Input image dimensions.
input_shape = x_train.shape[1:]
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
# If subtract pixel mean is enabled
if subtract_pixel_mean:
x_train_mean = np.mean(x_train, axis = 0)
x_train -= x_train_mean
x_test -= x_train_mean
# Print Training and Test Samples
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
第3步:在這一步中,我們根據(jù)epoch的個(gè)數(shù)來設(shè)置學(xué)習(xí)率。隨著迭代次數(shù)的增加,學(xué)習(xí)率必須降低以保證更好的學(xué)習(xí)。
代碼:設(shè)置不同epoch數(shù)的LR
# Setting LR for different number of Epochs
def lr_schedule(epoch):
lr = 1e-3
if epoch > 180:
lr *= 0.5e-3
elif epoch > 160:
lr *= 1e-3
elif epoch > 120:
lr *= 1e-2
elif epoch > 80:
lr *= 1e-1
print('Learning rate: ', lr)
return lr
第4步:定義基本的ResNet構(gòu)建塊,可以用來定義ResNet V1和V2架構(gòu)。
代碼:基本的ResNet構(gòu)建塊
# Basic ResNet Building Block
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='relu',
batch_normalization=True,
conv=Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x=inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
第5步:定義基于我們上面定義的ResNet構(gòu)建塊的ResNet V1架構(gòu):
代碼:ResNet V1架構(gòu)
def resnet_v1(input_shape, depth, num_classes=10):
if (depth - 2) % 6 != 0:
raise ValueError('depth should be 6n + 2 (eg 20, 32, 44 in [a])')
# Start model definition.
num_filters = 16
num_res_blocks = int((depth - 2) / 6)
inputs = Input(shape=input_shape)
x = resnet_layer(inputs=inputs)
# Instantiate the stack of residual units
for stack in range(3):
for res_block in range(num_res_blocks):
strides = 1
if stack & gt
0 and res_block == 0: # first layer but not first stack
strides = 2 # downsample
y = resnet_layer(inputs=x,
num_filters=num_filters,
strides=strides)
y = resnet_layer(inputs=y,
num_filters=num_filters,
activation=None)
if stack & gt
0 and res_block == 0: # first layer but not first stack
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
num_filters *= 2
# Add classifier on top.
# v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
第6步:定義基于我們上面定義的ResNet構(gòu)建塊的ResNet V2架構(gòu):
代碼:ResNet V2架構(gòu)
# ResNet V2 architecture
def resnet_v2(input_shape, depth, num_classes=10):
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n + 2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'relu'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
第7步:下面的代碼用于訓(xùn)練和測(cè)試我們上面定義的ResNet v1和v2架構(gòu):
代碼:Main函數(shù)
# Main function
if version == 2:
model = resnet_v2(input_shape = input_shape, depth = depth)
else:
model = resnet_v1(input_shape = input_shape, depth = depth)
model.compile(loss ='categorical_crossentropy',
optimizer = Adam(learning_rate = lr_schedule(0)),
metrics =['accuracy'])
model.summary()
print(model_type)
# Prepare model saving directory.
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar10_% s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)
# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath = filepath,
monitor ='val_acc',
verbose = 1,
save_best_only = True)
lr_scheduler = LearningRateScheduler(lr_schedule)
lr_reducer = ReduceLROnPlateau(factor = np.sqrt(0.1),
cooldown = 0,
patience = 5,
min_lr = 0.5e-6)
callbacks = [checkpoint, lr_reducer, lr_scheduler]
# Run training, with or without data augmentation.
if not data_augmentation:
print('Not using data augmentation.')
model.fit(x_train, y_train,
batch_size = batch_size,
epochs = epochs,
validation_data =(x_test, y_test),
shuffle = True,
callbacks = callbacks)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
# set input mean to 0 over the dataset
featurewise_center = False,
# set each sample mean to 0
samplewise_center = False,
# divide inputs by std of dataset
featurewise_std_normalization = False,
# divide each input by its std
samplewise_std_normalization = False,
# apply ZCA whitening
zca_whitening = False,
# epsilon for ZCA whitening
zca_epsilon = 1e-06,
# randomly rotate images in the range (deg 0 to 180)
rotation_range = 0,
# randomly shift images horizontally
width_shift_range = 0.1,
# randomly shift images vertically
height_shift_range = 0.1,
# set range for random shear
shear_range = 0.,
# set range for random zoom
zoom_range = 0.,
# set range for random channel shifts
channel_shift_range = 0.,
# set mode for filling points outside the input boundaries
fill_mode ='nearest',
# value used for fill_mode = "constant"
cval = 0.,
# randomly flip images
horizontal_flip = True,
# randomly flip images
vertical_flip = False,
# set rescaling factor (applied before any other transformation)
rescale = None,
# set function that will be applied on each input
preprocessing_function = None,
# image data format, either "channels_first" or "channels_last"
data_format = None,
# fraction of images reserved for validation (strictly between 0 and 1)
validation_split = 0.0)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size = batch_size),
validation_data =(x_test, y_test),
epochs = epochs, verbose = 1, workers = 4,
callbacks = callbacks)
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose = 1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
4、結(jié)果與總結(jié)
在ImageNet數(shù)據(jù)集上,作者使用了152層的ResNet,其深度是VGG19的8倍,但參數(shù)仍然較少。在ImageNet測(cè)試集上,這些ResNets的集合產(chǎn)生的錯(cuò)誤率僅為3.7%,這一結(jié)果贏得了ILSVRC 2015競(jìng)賽。在COCO對(duì)象檢測(cè)數(shù)據(jù)集上,由于它的深度表示,也產(chǎn)生了28%的相對(duì)改進(jìn)。
- 上面的結(jié)果表明,快捷連接將能夠解決增加層數(shù)所帶來的問題,因?yàn)楫?dāng)我們將層數(shù)從18層增加到34層時(shí),ImageNet驗(yàn)證集上的錯(cuò)誤率也會(huì)與普通網(wǎng)絡(luò)不同而降低。
- 下面是ImageNet測(cè)試集的結(jié)果。ResNet的前5名錯(cuò)誤率為3.57%,是最低的,因此ResNet架構(gòu)在2015年ImageNet分類挑戰(zhàn)中排名第一。
博客主頁:https://blog.csdn.net/weixin_51141489,需要源碼或相關(guān)資料實(shí)物的友友請(qǐng)關(guān)注、點(diǎn)贊,私信吧!