我正在学习如何使用TensorFlow/tensorboard,我正在使用来自Kaggle的mnist数据集。由于某些原因,权重没有更新(Tensorboard输出在下面)。
我已经把模型简化成一些非常简单的东西(一个卷积层和一个完全连接的层),我已经验证了图像是正确的(见下面),但是我完全不知道问题是什么。
我的问题是,下面的代码有什么问题?
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
%matplotlib inline
train_df = pd.read_csv('./train.csv')
def get_images(df):
df = df.copy()
labels = df['label'].values
enc = OneHotEncoder(sparse=False)
enc.fit(np.array(range(10)).reshape(-1,1))
labels = enc.transform(labels.reshape(-1,1))
del df['label']
images = df.values.reshape(len(df), 28, 28, 1)
return labels, images
def view_images(images, labels):
num_cols = int(np.ceil(np.sqrt(len(images))))
fig, axs = plt.subplots(nrows=num_cols,ncols=num_cols)
axs = axs.flatten()
for idx, ax in enumerate(axs):
ax.imshow(images[idx].reshape(28,28))
ax.axes.set_title(np.argmax(labels[idx]))
plt.show()
import tensorflow as tf
tf.reset_default_graph()
def conv_layer(X, input_channels, output_channels, name):
with tf.name_scope(name):
W = tf.Variable(tf.random_normal(
[3,3,input_channels,output_channels],stddev=.01), name='W')
tf.summary.histogram('weights', W)
l1 = tf.nn.conv2d(X, filter=W, strides=[1,1,1,1], padding='SAME')
l1 = tf.nn.relu(l1)
return l1
def fc_layer(X, input_channels, output_channels, name):
with tf.name_scope(name):
W = tf.Variable(tf.random_normal([input_channels, output_channels]))
l1 = tf.matmul(X, W)
return l1
def train_model():
labels, images = get_images(train_df)
trX, teX, trY, teY = train_test_split(images, labels, test_size=.2, random_state=42)
X = tf.placeholder('float', [None, 28, 28, 1]) #28x28x1
Y = tf.placeholder('float', [None, 10])
h = conv_layer(X, 1, 32, 'layer1')
h = tf.reshape(h, [-1, 28*28*32])
Yp = fc_layer(h, 28*28*32, 10, 'fc1')
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Yp, labels=Y))
train_op = tf.train.GradientDescentOptimizer(.01).minimize(cost)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
merged = tf.summary.merge_all()
fw = tf.summary.FileWriter('./hope', sess.graph, flush_secs=5)
for i in range(50):
for start, end in zip(range(0,len(trX), 100), range(100, len(trX)+1, 100)):
sess.run(train_op, feed_dict={X:trX[start:end], Y:trY[start:end]})
summary = sess.run(merged, feed_dict={X:teX, Y:teY})
fw.add_summary(summary, i)
print 'Cost, i, sess.run(cost, feed_dict={X:teX, Y:teY})
为了确保我得到的数据是正确的:
labels, images = get_images(train_df)
trX, teX, trY, teY = train_test_split(images, labels, test_size=.2, random_state=42)
view_images(teX[0:16],teY[0:16])
view_images(trX[0:16],trY[0:16])
从tensorboard上看,可以发现权重根本没有更新。
此外,输出值也不会在每次迭代中发生变化:
>train_model()
Cost 1, 2.30258
Cost 2, 2.30258
Cost 3, 2.30258
发布于 2018-07-20 17:18:49
你的权重可能在更新。但问题在于,你一直在用相同的值初始化新的权重。
W = tf.Variable(tf.random_normal(
[3,3,input_channels,output_channels],stddev=.01), name='W')
你应该使用get_variable函数,代码如下:
W = tf.get_variable("W",initializer=tf.random_normal([3,3,1,32],stddev=0.01))
get_variable函数将执行两个任务:
1)如果“W”不存在,该函数将会创建它(这就是为什么你必须传递变量的初始化方式及其维度)。
2)如果“W”已经存在,该函数将会获取它的值。
https://stackoverflow.com/questions/-100005610
复制相似问题