基于Tensorflow实现DeepFM前言网络结构代码部分

sladesal

发布于 2018-08-27 11:02:33

1.4K0

发布于 2018-08-27 11:02:33

文章被收录于专栏：机器学习之旅

前言

DeepFM，Ctr预估中的大杀器，哈工大与华为诺亚方舟实验室荣耀出品，算法工程师面试高频考题，有效的结合了神经网络与因子分解机在特征学习中的优点：同时提取到低阶组合特征与高阶组合特征，这样的称号我可以写几十条出来，这也说明了DeepFM确实是一个非常值得手动撸一边的算法。

当然，早就有一票人写了一车封装好的deepFM的模型，大家随便搜搜肯定也能搜到，既然这样，我就不再搞这些东西了，今天主要和大家过一遍，deepFM的代码是咋写的，手把手入门一下，说一些我觉得比较重要的地方，方便大家按需修改。（只列举了一部分，更多的解释参见GitHub代码中的注释）

本文的数据和部分代码构造参考了nzc大神的deepfm的Pytorch版本的写法，改成tensorflow的形式，需要看原版的自取。

网络结构

DeepFM包含两部分：神经网络部分与因子分解机部分，分别负责低阶特征的提取和高阶特征的提取，两部分权重共享。 DeepFM的预测结果可以写为：y = sigmoid(y(fm)+y(DNN))

FM部分

FM公式为：

我很久之前一篇文章细节讲过，这边就不多扯了，更多详见FM理论解析及应用。

DNN部分

这边其实和我上篇文章说的MLPS差距不大，也就是简单的全链接，差就差在input的构造，这边采取了embedding的思想，将每个feature转化成了embedded vector作为input，同时此处的input也是上面计算FM中的V，更多的大家看代码就完全了解了。

代码部分

我一共写了两个script，build_data.py和deepfm.py，也很好理解。build_data.py用来预处理数据，deepfm.py用来跑模型。

build_data.py

    for i in range(1, data.shape[1]):
        target = data.iloc[:, I]
        col = target.name
        l = len(set(target))
        if l > 10:
            target = (target - target.mean()) / target.std()
            co_feature = pd.concat([co_feature, target], axis=1)
            feat_dict[col] = cnt
            cnt += 1
            co_col.append(col)
        else:
            us = target.unique()
            print(us)
            feat_dict[col] = dict(zip(us, range(cnt, len(us) + cnt)))
            ca_feature = pd.concat([ca_feature, target], axis=1)
            cnt += len(us)
            ca_col.append(col)
    feat_dim = cnt
    feature_value = pd.concat([co_feature, ca_feature], axis=1)
    feature_index = feature_value.copy()
    for i in feature_index.columns:
        if i in co_col:
            feature_index[i] = feat_dict[I]
        else:
            feature_index[i] = feature_index[i].map(feat_dict[I])
            feature_value[i] = 1.

核心部分如上，重要的是做了两件事情，生成了feature_index和feature_value。

feature_index是把所有特征进行了标序，feature1，feature2......featurem，分别对应0，1，2，3，...m，但是，请注意分类变量需要拆分！就是说如果有性别：男|女|未知，三个选项。需要构造feature男，feature女，feature未知三个变量，而连续变量就不需要这样。

feature_value就是特征的值，连续变量按真实值填写，分类变量全部填写1。

更加形象的如下：

deepfm.py

特征向量化

        # 特征向量化，类似原论文中的v
        self.weight['feature_weight'] = tf.Variable(
            tf.random_normal([self.feature_sizes, self.embedding_size], 0.0, 0.01),
            name='feature_weight')

        # 一次项中的w系数，类似原论文中的w
        self.weight['feature_first'] = tf.Variable(
            tf.random_normal([self.feature_sizes, 1], 0.0, 1.0),
            name='feature_first')

可以对照下面的公式看，更有感觉。

deep网络部分的weight

        # deep网络初始input：把向量化后的特征进行拼接后带入模型，n个特征*embedding的长度
        input_size = self.field_size * self.embedding_size
        init_method = np.sqrt(2.0 / (input_size + self.deep_layers[0]))
        self.weight['layer_0'] = tf.Variable(
            np.random.normal(loc=0, scale=init_method, size=(input_size, self.deep_layers[0])), dtype=np.float32
        )
        self.weight['bias_0'] = tf.Variable(
            np.random.normal(loc=0, scale=init_method, size=(1, self.deep_layers[0])), dtype=np.float32
        )
        # 生成deep network里面每层的weight 和 bias
        if num_layer != 1:
            for i in range(1, num_layer):
                init_method = np.sqrt(2.0 / (self.deep_layers[i - 1] + self.deep_layers[I]))
                self.weight['layer_' + str(i)] = tf.Variable(
                    np.random.normal(loc=0, scale=init_method, size=(self.deep_layers[i - 1], self.deep_layers[i])),
                    dtype=np.float32)
                self.weight['bias_' + str(i)] = tf.Variable(
                    np.random.normal(loc=0, scale=init_method, size=(1, self.deep_layers[i])),
                    dtype=np.float32)

        # deep部分output_size + 一次项output_size + 二次项output_size
        last_layer_size = self.deep_layers[-1] + self.field_size + self.embedding_size
        init_method = np.sqrt(np.sqrt(2.0 / (last_layer_size + 1)))
        # 生成最后一层的结果
        self.weight['last_layer'] = tf.Variable(
            np.random.normal(loc=0, scale=init_method, size=(last_layer_size, 1)), dtype=np.float32)
        self.weight['last_bias'] = tf.Variable(tf.constant(0.01), dtype=np.float32)

input的地方需要注意一下，这边用了个技巧，直接把把向量化后的特征进行拉伸拼接后带入模型，原来的v是batch*n个特征*embedding的长度，直接改成了batch*（n个特征*embedding的长度），这样的好处就是全值共享，又快又有效。

网络传递部分

都是一些正常的操作，稍微要注意一下的是交互项的计算：

        # second_order
        self.sum_second_order = tf.reduce_sum(self.embedding_part, 1)
        self.sum_second_order_square = tf.square(self.sum_second_order)
        print('sum_square_second_order:', self.sum_second_order_square)
        # sum_square_second_order: Tensor("Square:0", shape=(?, 256), dtype=float32)

        self.square_second_order = tf.square(self.embedding_part)
        self.square_second_order_sum = tf.reduce_sum(self.square_second_order, 1)
        print('square_sum_second_order:', self.square_second_order_sum)
        # square_sum_second_order: Tensor("Sum_2:0", shape=(?, 256), dtype=float32)

        # 1/2*((a+b)^2 - a^2 - b^2)=ab
        self.second_order = 0.5 * tf.subtract(self.sum_second_order_square, self.square_second_order_sum)

        self.fm_part = tf.concat([self.first_order, self.second_order], axis=1)
        print('fm_part:', self.fm_part)

直接实现了下面的计算逻辑：

loss部分

我个人重写了一下我认为需要正则的地方，和一些loss的计算方式：

        # loss
        self.out = tf.nn.sigmoid(self.out)

        # loss = tf.losses.log_loss(label,out) 也行，看大家想不想自己了解一下loss的计算过程
        self.loss = -tf.reduce_mean(
            self.label * tf.log(self.out + 1e-24) + (1 - self.label) * tf.log(1 - self.out + 1e-24))

        # 正则：sum(w^2)/2*l2_reg_rate
        # 这边只加了weight，有需要的可以加上bias部分
        self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg_rate)(self.weight["last_layer"])
        for i in range(len(self.deep_layers)):
            self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg_rate)(self.weight["layer_%d" % I])

大家也可以直接按照我注释掉的部分简单操作，看个人的理解了。

梯度正则

        self.global_step = tf.Variable(0, trainable=False)
        opt = tf.train.GradientDescentOptimizer(self.learning_rate)
        trainable_params = tf.trainable_variables()
        print(trainable_params)
        gradients = tf.gradients(self.loss, trainable_params)
        clip_gradients, _ = tf.clip_by_global_norm(gradients, 5)
        self.train_op = opt.apply_gradients(
            zip(clip_gradients, trainable_params), global_step=self.global_step)

很多网上的代码跑着跑着就NAN了，建议加一下梯度的正则，反正也没多复杂。

执行结果

/Users/slade/anaconda3/bin/python /Users/slade/Documents/Personalcode/machine-learning/Python/deepfm/deepfm.py
[2 1 0 3 4 6 5 7]
[0 1 2]
[6 0 8 2 4 1 7 3 5 9]
[2 3 1 0]
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
embedding_part: Tensor("Mul:0", shape=(?, 39, 256), dtype=float32)
first_order: Tensor("Sum:0", shape=(?, 39), dtype=float32)
sum_square_second_order: Tensor("Square:0", shape=(?, 256), dtype=float32)
square_sum_second_order: Tensor("Sum_2:0", shape=(?, 256), dtype=float32)
fm_part: Tensor("concat:0", shape=(?, 295), dtype=float32)
deep_embedding: Tensor("Reshape_2:0", shape=(?, 9984), dtype=float32)
output: Tensor("Add_3:0", shape=(?, 1), dtype=float32)
[<tensorflow.python.ops.variables.Variable object at 0x10e2a9ba8>, <tensorflow.python.ops.variables.Variable object at 0x112885ef0>, <tensorflow.python.ops.variables.Variable object at 0x1129b3c18>, <tensorflow.python.ops.variables.Variable object at 0x1129b3da0>, <tensorflow.python.ops.variables.Variable object at 0x1129b3f28>, <tensorflow.python.ops.variables.Variable object at 0x1129b3c50>, <tensorflow.python.ops.variables.Variable object at 0x112a03dd8>, <tensorflow.python.ops.variables.Variable object at 0x112a03b38>, <tensorflow.python.ops.variables.Variable object at 0x16eae5c88>, <tensorflow.python.ops.variables.Variable object at 0x112b937b8>]
time all:7156
epoch 0:
the times of training is 0, and the loss is 8.54514
the times of training is 100, and the loss is 1.60875
the times of training is 200, and the loss is 0.681524
the times of training is 300, and the loss is 0.617403
the times of training is 400, and the loss is 0.431383
the times of training is 500, and the loss is 0.531491
the times of training is 600, and the loss is 0.558392
the times of training is 800, and the loss is 0.51909
...

看了下没啥大问题。

还有一些要说的

build_data.py中我为了省事，只做了标准化，没有进行其他数据预处理的步骤，这个是错误的，大家在实际使用中请按照我在公众号里面给大家进行的数据预处理步骤进行，这个非常重要！
learing_rate是我随便设置的，在实际大家跑模型的时候，请务必按照1.0，1e-3，1e-6，三个节点进行二分调优。
如果你直接搬上面代码，妥妥过拟合，请在真实使用过程中，务必根据数据量调整batch的大小，epoch的大小，建议在每次传递完成后加上tf.nn.dropout进行dropout。
如果数据量连10万量级都不到，我还是建议用机器学习的方法，xgboost+lr，mixed logistics regression等等都是不错的方法。

好了，最后附上全量代码的地址Github，希望对大家有所帮助。

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018.07.30 ，如有侵权请联系 cloudcommunity@tencent.com 删除

tensorflow

编程算法

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

tensorflow

编程算法

登录后参与评论

0 条评论

热度