TensorFlow入门（3）：使用神经网络拟合N元一次方程

原创

谭正中

修改于 2017-07-04 09:54:35

5.7K1

修改于 2017-07-04 09:54:35

文章被收录于专栏：谭正中的专栏

背景

前面一篇文章《TensorFlow 入门：求 N 元一次方程》在已知表达式形式的情况下，获得了各个参数的值，但是现实中大部分情况是不能简单使用 N 元一次方程这样的公式表达的，神经网络的出现，给这类问题提供了一个很好的解决方法。本文继续给出一个简单的例子，使用 TensorFlow，利用神经网络对 N 元一次方程进行拟合。关于神经网络的简单入门介绍，可以参考这篇文章。

如何实现

在使用 TensorFlow 之前，还是要 import 相关的包：

#!/usr/bin/python
#coding=utf-8
import tensorflow as tf
import numpy as np

tf.logging.set_verbosity(tf.logging.ERROR)              #日志级别设置成 ERROR，避免干扰
np.set_printoptions(threshold='nan')                    #打印内容不限制长度

首先回顾一下前面的功能，我们有一个函数，它有 5 个输入值，一个输出值，这里使用 param_count 表示输入值的个数，当前它的值为 5：

param_count = 5         #变量数

我们需要一个已知的函数来生成数据，根据函数 y=x*w，w 是这个函数的参数，令它为大小为 [param_size,1]的矩阵，这里我随便填了 5 个 0 到 1000 的数字：

#要求的值
t_w = np.floor([511,231,86,434,523],dtype=np.float32).reshape([param_count,1])
print t_w

对于训练的输入和输出值，使用 placeholder 进行表示：

#x 是输入量，对应 t_x，用于训练输入，在训练过程中，由外部提供，因此是 placeholder 类型
x = tf.placeholder(tf.float32,shape=[1,param_count])
y = tf.placeholder(tf.float32,shape=[1,1])

而之前的 w，因为我们使用神经网络表示了，因此不需要了，我们甚至不需要知道这个函数一定是个 N 元一次方程。接下来就是重点部分，构造神经网络。TensorFlow 提供了很多高级 API，这个问题是一个回归问题，回归问题，就是通过一定的值，预测值的问题，这个和前篇的分类是不同的问题。我们使用 tf.contrib.learn.DNNRegressor 来构造神经网络，首先需要告诉它输入有哪些参数，叫做特征列，因为我们只有一个 x 输入，它是一个大小为 [1,param_size]的矩阵，因此定义一个输入 x：

feature_columns = [tf.contrib.layers.real_valued_column("x")]

使用上面构造的特征列构造一个 DNNRegressor 的实例，这里先把隐藏层 hidden_units 设置为 [5,5]，表示有 2 个隐藏层，每层有 5 个神经元，关于这个值怎么设置，学问很大，我暂时还说不清楚，未来了解后再补充。另一个参数 model_dir 指定学习的结果存放的路径，如果存在则读取，不存在则创建，因为训练神经网络一般比较耗时，因此尽量将结果保存下来，这样即便中途中断，也可以恢复，如果格式不一样，比如特征列或者隐藏层数量不一样，TensorFlow 会报错。

regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_columns,
                                                                                hidden_units=[5,5],
                                                                                model_dir="/tmp/testtest")

然后就可以开始训练过程了，训练的过程可以每次生成一组新的训练数据，然后调用 regressor.fit 函数训练 2000 次，其中参数 x 表示输入值，参数 y 表示输出值。然后生成一组新的测试数据调用 regressor.evaluate 函数进行评估，当 Loss 函数小于一定值的时候停止训练：

LOSS_MIN_VALUE = 10
while True:
        t_x = np.floor(1000 * np.random.random([1,param_count]),dtype=np.float32)
        t_y = t_x.dot(t_w)
        regressor.fit(x=t_x,y=t_y, steps=2000)

        e_x = np.floor(1000 * np.random.random([1,param_count]),dtype=np.float32)
        e_y = e_x.dot(t_w)
        evaluate_result = regressor.evaluate(x=e_x,y=e_y)
        print evaluate_result

        if evaluate_result['loss'] < LOSS_MIN_VALUE:
                break

不出意外的话，现在就可以开始训练了。最终训练的目的是为了给出指定的输入值，返回一个预测值，我们生成一组预测值，并且看看预测效果：

p_x = np.floor(1000 * np.random.random([1,param_count]),dtype=np.float32)
p_y = p_x.dot(t_w)
print "预测输入:%s" % p_x
print "实际结果:%s" % p_y
print "预测值:", str(list(regressor.predict(p_x)))

完整代码如下：

#!/usr/bin/python
#coding=utf-8
import tensorflow as tf
import numpy as np

tf.logging.set_verbosity(tf.logging.ERROR)              #日志级别设置成 ERROR，避免干扰
np.set_printoptions(threshold='nan')                    #打印内容不限制长度

param_count = 5         #变量数

#要求的值
t_w = np.floor([511,231,86,434,523],dtype=np.float32).reshape([param_count,1])
print t_w

#x 是输入量，对应 t_x，用于训练输入，在训练过程中，由外部提供，因此是 placeholder 类型
x = tf.placeholder(tf.float32,shape=[1,param_count])
y = tf.placeholder(tf.float32,shape=[1,1])

#w 是要求的各个参数的权重，是目标输出，对应 t_w
w = tf.Variable(np.zeros(param_count,dtype=np.float32).reshape((param_count,1)), tf.float32)    

feature_columns = [tf.contrib.layers.real_valued_column("")]
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_columns,
                                                                                hidden_units=[5,5],
                                                                                model_dir="/tmp/test")

LOSS_MIN_VALUE = 10
while True:
        t_x = np.floor(1000 * np.random.random([1,param_count]),dtype=np.float32)
        t_y = t_x.dot(t_w)
        regressor.fit(x=t_x,y=t_y, steps=2000)

        e_x = np.floor(1000 * np.random.random([1,param_count]),dtype=np.float32)
        e_y = e_x.dot(t_w)
        evaluate_result = regressor.evaluate(x=e_x,y=e_y)
        print evaluate_result

        if evaluate_result['loss'] < LOSS_MIN_VALUE:
                break

p_x = np.floor(1000 * np.random.random([1,param_count]),dtype=np.float32)
p_y = p_x.dot(t_w)
print "预测输入:%s" % p_x
print "实际结果:%s" % p_y
print "预测值:" % str(list(regressor.predict(p_x)))

这样训练大约 75W 次后（使用 Z3740(1.33GHz) 大约需要 1 小时的时间），Loss 函数会降低到 10 以内，得到的预测值和实际结果已经相差很小了。

...
{'loss': 55225.0, 'global_step': 748001}
{'loss': 24404.297, 'global_step': 750001}
{'loss': 8824.2539, 'global_step': 752001}
{'loss': 3.515625, 'global_step': 754001}
预测输入:[[ 637.  972.  228.  320.  840.]]
实际结果:[[ 1147847.]]
预测值:[1147869.4]

但是可以看到 loss 函数并不是很稳定，可能突增或者突降，因为每次提供的训练数据太少了，我们可以通过提高 x 和 y 的大小来加快训练，同时提高训练效果，可以通过修改 x 和 y 矩阵大小来达到目的，修改后的代码如下：

#!/usr/bin/python
#coding=utf-8
import tensorflow as tf
import numpy as np

tf.logging.set_verbosity(tf.logging.ERROR)              #日志级别设置成 ERROR，避免干扰
np.set_printoptions(threshold='nan')                    #打印内容不限制长度

param_count = 5         #变量数
test_count = 20         #每次训练的样本数

#要求的值
t_w = np.floor([511,231,86,434,523],dtype=np.float32).reshape([param_count,1])
print(t_w)

#x 是输入量，对应 t_x，用于训练输入，在训练过程中，由外部提供，因此是 placeholder 类型
x = tf.placeholder(tf.float32,shape=[test_count,param_count])
y = tf.placeholder(tf.float32,shape=[test_count,1])

#w 是要求的各个参数的权重，是目标输出，对应 t_w
w = tf.Variable(np.zeros(param_count,dtype=np.float32).reshape((param_count,1)), tf.float32)    

feature_columns = [tf.contrib.layers.real_valued_column("")]
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_columns,
                                                                                hidden_units=[5,5],
                                                                                model_dir="/tmp/test2")

LOSS_MIN_VALUE = 200

while True:
        t_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        t_y = t_x.dot(t_w)
        regressor.fit(x=t_x,y=t_y, steps=2000)

        e_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        e_y = e_x.dot(t_w)
        evaluate_result = regressor.evaluate(x=e_x,y=e_y)
        print evaluate_result

        if evaluate_result['loss'] < LOSS_MIN_VALUE:
                break

p_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
p_y = p_x.dot(t_w)
print("预测输入:%s" % p_x)
print("实际结果:%s" % p_y)
print("预测值:" % str(list(regressor.predict(p_x))))

同时跑上面 2 个训练，可以发现优化后的训练速度大大加快了，loss 函数降低很迅速。由于 loss 函数是 20 个值的标准差，所以相应要提高一些。神经网络训练出来的结果不是一个 [5,1]的矩阵，因此对于验证和预测输入，不能只是大小为 [1,5]的矩阵，需要是大小为 [20,5]的矩阵，所以在预测的时候，可以填充无效值，结果只取 y 的第一个值就好了。在经过 150W 次训练之后，得到了比较准确的预测效果：

...
{'loss': 294.09082, 'global_step': 1540000}
{'loss': 366.38013, 'global_step': 1542000}
{'loss': 354.60809, 'global_step': 1544000}
{'loss': 258.04257, 'global_step': 1546000}
{'loss': 225.32999, 'global_step': 1548000}
{'loss': 229.31587, 'global_step': 1550000}
{'loss': 196.7415, 'global_step': 1552000}
预测输入:[[  88.  923.  374.  635.  711.]
 [ 659.  816.   95.  529.  240.]
 [ 411.  970.  963.   95.  378.]
 [ 739.  568.  467.  124.  469.]
 [ 461.   60.  616.  774.  500.]
 [ 876.   28.  483.  106.  532.]
 [ 283.  447.  340.  541.  165.]
 [ 838.  802.  105.  578.  171.]
 [ 224.  160.  620.  837.  173.]
 [ 815.  969.  483.  997.  576.]
 [ 986.  492.  948.  215.  609.]
 [ 676.  817.  119.  300.  709.]
 [ 377.  327.  536.  759.  331.]
 [ 601.  991.  602.  491.  750.]
 [ 656.  802.  498.  757.  939.]
 [ 269.  572.  805.  192.  328.]
 [ 114.  968.   34.  154.  772.]
 [ 687.  504.  466.  213.  143.]
 [ 467.  327.  637.  698.  791.]
 [ 993.  520.   29.  543.  233.]]
实际结果:[[  937788.]
 [  888521.]
 [  755833.]
 [  848102.]
 [  899823.]
 [  819882.]
 [  598199.]
 [  962795.]
 [  658481.]
 [ 1415788.]
 [ 1110843.]
 [ 1045404.]
 [  816799.]
 [ 1193148.]
 [ 1382941.]
 [  593693.]
 [  755378.]
 [  674788.]
 [ 1085581.]
 [  987558.]]
预测值:[937790.06, 888532.62, 755829.12, 848110.38, 899828.75, 819897.31, 598223.69, 962802.19, 658497.56, 1415759.2, 1110825.8, 1045408.4, 816807.94, 1193129.9, 1382917.4, 593705.81, 755399.44, 674805.31, 1085574.5, 987569.0]

通过函数传入训练数据

TensorFlow 还提供通过函数的方式传入输入数据，上面的例子是在 while 循环中将训练数据生成好传入，如果训练数据比较复杂或者不想将其与训练的代码耦合太大，可以将读取训练数据封装成一个函数传给 fit、evaluate 和 predict。这个函数需要返回 2 个值，第一个返回值是输入，它是一个字典，Key 是特征列，Value 是特征值，第二个返回值是输入对应的输出值，比如上面的例子，可以这样构造训练集：

def get_train_inputs():
        t_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        t_y = t_x.dot(t_w)

        #第一个参数是一个字典，Key 是变量名称，Value 是变量的值转成 Tensor
        feature_cols = {'x': tf.constant(t_x)}

        #第二个参数就是结果值，也要转成 Tensor
        return feature_cols,tf.constant(t_y)

传入到 fit 的方式是这样的：

regressor.fit(input_fn=lambda: get_train_inputs(), steps=2000)

完整代码如下：

#!/usr/bin/python
#coding=utf-8
import tensorflow as tf
import numpy as np

tf.logging.set_verbosity(tf.logging.ERROR)              #日志级别设置成 ERROR，避免干扰
np.set_printoptions(threshold='nan')                    #打印内容不限制长度

param_count = 5         #变量数
test_count = 20         #每次训练的样本数

#要求的值
t_w = np.floor([511,231,86,434,523],dtype=np.float32).reshape([param_count,1])

print(t_w)

#x 是输入量，对应 t_x，用于训练输入，在训练过程中，由外部提供，因此是 placeholder 类型
x = tf.placeholder(tf.float32,shape=[test_count,param_count])
y = tf.placeholder(tf.float32,shape=[test_count,1])

feature_columns = [tf.contrib.layers.real_valued_column("x")]
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_columns,
                                                                                hidden_units=[5,5],
                                                                                model_dir="/tmp/test6")

def get_train_inputs():
        t_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        t_y = t_x.dot(t_w)

        #第一个参数是一个字典，Key 是变量名称，Value 是变量的值转成 Tensor
        feature_cols = {'x': tf.constant(t_x)}

        #第二个参数就是结果值，也要转成 Tensor
        return feature_cols,tf.constant(t_y)

def get_test_inputs():
        e_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        e_y = e_x.dot(t_w)

        feature_cols = {'x': tf.constant(e_x)}
        return feature_cols,tf.constant(e_y)

def get_predict_inputs():
        p_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        feature_cols = {'x': tf.constant(p_x)}
        p_y = p_x.dot(t_w)
        print("预测输入:%s" % p_x)
        print("实际结果:%s" % p_y)
        return feature_cols

LOSS_MIN_VALUE = 50
while True:
        regressor.fit(input_fn=lambda: get_train_inputs(), steps=2000)
        evaluate_result = regressor.evaluate(input_fn=lambda: get_test_inputs(),steps=1)
        print(evaluate_result)

        if evaluate_result['loss'] < LOSS_MIN_VALUE:
                break

result = str(list(regressor.predict(input_fn=lambda: get_predict_inputs())))
print("预测结果:%s" % result)
在训练了 210W 次后，loss 函数降低到了 50 以内：

...
{'loss': 71.71196, 'global_step': 2078000}
{'loss': 71.513145, 'global_step': 2080000}
{'loss': 79.737793, 'global_step': 2082000}
{'loss': 69.766647, 'global_step': 2084000}
{'loss': 91.727737, 'global_step': 2086000}
{'loss': 49.118164, 'global_step': 2088000}
{'loss': 89.724609, 'global_step': 2090000}
{'loss': 48.275196, 'global_step': 2092000}
{'loss': 54.506691, 'global_step': 2094000}
{'loss': 50.299416, 'global_step': 2096001}
预测输入:[[ 110.  252.  437.  218.  528.]
 [ 304.  281.  179.  669.  688.]
 [ 156.  785.  812.  535.  715.]
 [  32.  821.  266.  107.  416.]
 [ 132.  912.  561.  215.  688.]
 [ 786.  665.  117.  864.  779.]
 [ 561.  403.  257.  520.  787.]
 [ 233.  614.  735.  854.  271.]
 [ 516.  888.  904.  263.   64.]
 [ 851.  858.  576.  402.  957.]
 [ 391.  445.  224.  319.  161.]
 [ 586.  260.  565.  143.  293.]
 [ 440.  406.  304.  249.  552.]
 [ 912.  211.  985.  953.  326.]
 [  86.  302.  444.  385.  724.]
 [ 345.  488.  654.  903.  408.]
 [ 994.  118.  197.  266.  851.]
 [ 828.  846.  235.   10.  574.]
 [ 493.   31.  151.  592.  866.]
 [ 126.  593.  229.  904.  606.]]
实际结果:[[  522760.]
 [  885819.]
 [  937018.]
 [  492885.]
 [  779504.]
 [ 1347716.]
 [ 1039147.]
 [  836476.]
 [  694162.]
 [ 1357574.]
 [  544509.]
 [  623397.]
 [  741532.]
 [ 1183583.]
 [  697634.]
 [  950553.]
 [ 1112651.]
 [  943286.]
 [  981916.]
 [  930337.]]
预测结果:[522770.38, 885826.75, 937008.19, 492889.97, 779497.69, 1347710.8, 1039148.3, 836474.19, 694155.62, 1357557.6, 544520.88, 623404.56, 741538.06, 1183577.8, 697640.12, 950551.81, 1112655.1, 943282.25, 981925.06, 930339.56]

因为预测需要提供 20 组数据，如果我们只需要预测一组怎么办呢？可以在全 0 矩阵中，只设置第一行的值：

def get_predict_inputs():
        #p_x = np.floor(1000 * np.random.random([test_count,param_count]),dtype=np.float32)
        p_x = np.floor(1000 * np.zeros([test_count,param_count]),dtype=np.float32)
        p_x[0] = [2,112,2,3,4]
        feature_cols = {'x': tf.constant(p_x)}
        p_y = p_x.dot(t_w)
        print("预测输入:%s" % p_x)
        print("实际结果:%s" % p_y)
        return feature_cols

这样预测出来的结果中，只取第一行的值就好了：

预测输入:[[   2.  112.    2.    3.    4.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]]
实际结果:[[ 30460.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]
 [     0.]]
预测结果:[30488.938, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239, 260.82239]

实际值是 30460，预测值是 30488.938，可见预测还是挺准确的。