TensorFlow 2.0多卡gpu训练

lovelife110

发布于 2021-01-14 16:20:18

1.4K0

发布于 2021-01-14 16:20:18

文章被收录于专栏：爱生活爱编程

环境

TensorFlow 2.0 python3.6

代码位置

https://github.com/lilihongjava/leeblog_python/tree/master/TensorFlow_GPU

模型代码说明

通过最简单的线性回归例子，实现TensorFlow多卡gpu例子

def model_train(x_data, y_data):
    layer0 = tf.keras.layers.Dense(1, input_shape=(x_data.shape[1],))
    model = tf.keras.Sequential([layer0])
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(x_data, y_data, epochs=100, verbose=False)
    return model

tf.keras.layers.Dense(1, input_shape=(x_data.shape[1],)) 只有一层神经单元，所以此层就是输出层，1就相当于是输出神经元个数就是1个，如果是分类问题，有几类就代表输出神经元的个数是几，后面的input_shape，就是指输入数据的维度。

编译模型： optimizer=‘adam’，优化器：梯度下降法优化 loss=‘mse’, 损失函数：使用均方差判断误差

gpu多卡利用代码说明

gpu为true开启多卡gpu支持，官网地址https://www.tensorflow.org/guide/gpu

if gpu:
    tf.debugging.set_log_device_placement(True)
    # 多卡gpu支持，维度必须是gpu卡的倍数
    gpu_len = len(tf.config.experimental.list_physical_devices('GPU'))
    print("gpu_len:" + str(gpu_len))
    dataset = tf.data.Dataset.from_tensor_slices((x_data.values, y_data.values))
    strategy = tf.distribute.MirroredStrategy()
    BATCH_SIZE_PER_REPLICA = 64
    BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
    print("x_data shape:" + str(x_data.shape))
    # tf1.14.0版本 维度必须是gpu卡的倍数 if x_data.shape[1] % gpu_len == 0 and x_data.shape[0] % gpu_len == 0:
    print("执行多卡gpu")
    with strategy.scope():
        layer0 = tf.keras.layers.Dense(1, input_shape=(x_data.shape[1],))
        model = tf.keras.Sequential([layer0])
        model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(dataset.batch(BATCH_SIZE), verbose=False)

GPU

watch -n 0.1 -d nvidia-smi 每隔0.1秒刷新一次

Dockerfile

FROM tensorflow/tensorflow:2.0.0-gpu-py3

WORKDIR /app

RUN pip install --upgrade setuptools \
numpy \
matplotlib \
xgboost \
pandas \
scikit-learn \
wheel \
flask -i https://pypi.douban.com/simple

docker run

请通过 docker -v 检查 Docker 版本。对于 19.03 之前的版本，您需要使用 nvidia-docker2 和 –runtime=nvidia 标记；对于 19.03 及之后的版本，您将需要使用 nvidia-container-toolkit 软件包和 --gpus all 标记。这两个选项都记录在上面链接的网页上。

nvidia-docker run -it -d    -v /root/lee/TF/:/app 10.1.8.XX:80/xxx/python_slim:1.4 --runtime=nvidia /bin/bash

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2020/06/23 ，如有侵权请联系 cloudcommunity@tencent.com 删除

tensorflow

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度