首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Tensorflow服务于aws新p3实例的CUDA 9编译

Tensorflow服务于aws新p3实例的CUDA 9编译
EN

Stack Overflow用户
提问于 2017-10-28 17:49:01
回答 1查看 267关注 0票数 0

我能够从Amazon的修改源重新编译Tensorflow (以新的方式提供他们的新的深入学习AMI)。

我现在正在尝试编译使用Tensorflow“叉”服务的tf,但是我得到了这个错误:

代码语言:javascript
运行
复制
ERROR: /root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:68:1: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/contrib/nccl:nccl_kernels':
this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_rewrite.cc':
  '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/optimization_registry.h'
  '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/device_set.h'
  '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/device.h'
  '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/types.h'
  '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/costmodel.h'
  '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/node_builder.h'
INFO: Elapsed time: 20.377s, Critical Path: 19.47s
FAILED: Build did NOT complete successfully

更多信息:我正在使用Tensorflow服务(提交7a349752c2cbbe741edb91c6c6be1c571e91a5fb)和BazelRelation0.7.0的主分支。

我还对tools/bazel.rc做了一个小改动,以解决另一个编译错误:

代码语言:javascript
运行
复制
# git diff tools/bazel.rc 
diff --git a/tools/bazel.rc b/tools/bazel.rc
index 9397f97..28476f3 100644
--- a/tools/bazel.rc
+++ b/tools/bazel.rc
@@ -1,4 +1,4 @@
-build:cuda --crosstool_top=@org_tensorflow//third_party/gpus/crosstool
+build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain
 build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true

 build --force_python=py2

知道少了什么吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-31 00:18:59

我通常禁用NCCL,因为它似乎从未正确构建:

https://github.com/PipelineAI/pipeline/blob/6261c4f31105e40ab8b24ccc7834f9181f4e5aaf/package/tensorflow/16d39e9-d690fdd/Dockerfile.full-gpu#L160

代码语言:javascript
运行
复制
RUN \
  cd $TENSORFLOW_SERVING_HOME \
  # Remove NCCL since it isn't building properly
  && sed -i.bak '/nccl/d' tensorflow/tensorflow/contrib/BUILD \
  && bazel build -c opt --config=cuda \
      --verbose_failures \
      --spawn_strategy=standalone --genrule_strategy=standalone \
      --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 \
      --crosstool_top=@local_config_cuda//crosstool:toolchain \
       tensorflow_serving/... \
  && chmod a+x bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
  && cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/ \
  && bazel clean --expunge
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46992704

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档