LSTM擅长时序数据的处理,但是如果时序数据是图像等三维图形来讲,因为有着丰富的空间信息并且每一个点和周围是有很强的相关性的,普通的LSTM很难刻画这种空间特征,于是在LSTM的基础上加上卷积操作捕捉空间特征,对于图像的特征提取会更加有效。
图一 经典LSTM
图二 ConvLSTM
二者的区别是,经典的LSTM的state-state采用全连接形式,而ConvLSTM采用卷积的形式
图三 普通LSTM与ConvLSTM的区别
下面使用Keras实践ConvLSTM,用于预测人造的图片的变化情况。 如果同学们没有GPU环境的话,建议在谷歌免费的深度学习平台Colab上运行,当然文末也会有本地版的Jupyter notebook分享。
下面的代码是加载相关库和对ConvLSTM模型的搭建与编译。
# Artificial data generation: # Generate movies with 3 to 7 moving squares inside.
# The squares are of shape 1x1 or 2x2 pixels,
# which move linearly over time. # For convenience we first create movies with bigger width and height (80x80)
#and at the end we select a 40x40 window.
def generate_movies(n_samples=, n_frames=):
row =
col =
noisy_movies = np.zeros((n_samples, n_frames, row, col, ), dtype=np.float)
shifted_movies = np.zeros((n_samples, n_frames, row, col, ), dtype=np.float)
for i in range(n_samples):
# Add 3 to 7 moving squares
n = np.random.randint(, )
for j in range(n):
# Initial position
xstart = np.random.randint(, )
ystart = np.random.randint(, )
# Direction of motion
directionx = np.random.randint(, ) -
directiony = np.random.randint(, ) -
# Size of the square
w = np.random.randint(, )
for t in range(n_frames):
x_shift = xstart + directionx * t
y_shift = ystart + directiony * t
noisy_movies[i, t, x_shift - w: x_shift + w, y_shift - w: y_shift + w, ] +=
# Make it more robust by adding noise.
# The idea is that if during inference, # the value of the pixel is not exactly one,
# we need to train the network to be robust and still
# consider it as a pixel belonging to a square.
if np.random.randint(, ):
noise_f = (-1)**np.random.randint(, )
noisy_movies[i, t, x_shift - w - : x_shift + w + , y_shift - w - : y_shift + w + , ] += noise_f * 0.1
# Shift the ground truth by 1
x_shift = xstart + directionx * (t + )
y_shift = ystart + directiony * (t + )
shifted_movies[i, t, x_shift - w: x_shift + w, y_shift - w: y_shift + w, ] +=
# Cut to a 40x40 window
noisy_movies = noisy_movies[::, ::, :, :, ::]
shifted_movies = shifted_movies[::, ::, :, :, ::]
noisy_movies[noisy_movies >= ] =
shifted_movies[shifted_movies >= ] =
return noisy_movies, shifted_movies
# Train the network
noisy_movies, shifted_movies = generate_movies(n_samples=)
seq.fit(noisy_movies[:], shifted_movies[:], batch_size=, epochs=, validation_split=0.05)
# Testing the network on one movie
# feed it with the first 7 positions and then
# predict the new positions
which =
track = noisy_movies[which][:, ::, ::, ::]
for j in range():
new_pos = seq.predict(track[np.newaxis, ::, ::, ::, ::])
new = new_pos[::, -1, ::, ::, ::]
track = np.concatenate((track, new), axis=)
# And then compare the predictions # to the ground truth
track2 = noisy_movies[which][::, ::, ::, ::]
for i in range():
fig = plt.figure(figsize=(, ))
ax = fig.add_subplot()
if i >= :
ax.text(, , 'Predictions !', fontsize=, color='w')
else: ax.text(, , 'Initial trajectory', fontsize=)
toplot = track[i, ::, ::, ]
plt.imshow(toplot)
ax = fig.add_subplot()
plt.text(, , 'Ground truth', fontsize=)
toplot = track2[i, ::, ::, ]
if i >= :
toplot = shifted_movies[which][i - , ::, ::, ]
plt.imshow(toplot)
plt.savefig('%i_animate.png' % (i + ))
本地版代码链接:https://pan.baidu.com/s/1V4_eTYV7vi2UNg_XdiP5mg 提取码:8v9c
谷歌Colab云端在线运行代码地址: https://colab.research.google.com/drive/1XmlpMzQK1REHbjVy61WPSh-oli08JZeZ (记得在修改-笔记本设置 选择框内打开GPU加速)
另外再分享下OpenCV+TensorFlow学习视频,可用于图像处理、预测 链接:https://pan.baidu.com/s/1bolCSMyVYanxPMf9TpQWBg 提取码:m8jl
电子书 OpenCV For ML 链接:https://pan.baidu.com/s/1ErAHu1PNibTBrzJTtsjovg 提取码:estj
有错误或指导请直接联系铁柱邮箱deepwind@aliyun.com