文章/答案/技术大牛

发布

社区首页 >问答首页 >Tensorflow Java多GPU推理

问Tensorflow Java多GPU推理
EN

Stack Overflow用户

提问于 2017-12-13 18:35:03

回答 2查看 1.6K关注 0票数 11

我有一个带有多个GPU的服务器，希望在java应用程序中的模型推断过程中充分利用它们。默认情况下，tensorflow捕获所有可用的GPU，但只使用第一个GPU。

我可以想出三种解决这个问题的办法：

将设备可见性限制在进程级别，即使用CUDA_VISIBLE_DEVICES环境变量。这需要我运行java应用程序的几个实例，并在它们之间分发通信量。不是那么诱人的主意。
在一个应用程序中启动几个会话，并尝试通过ConfigProto为每个应用程序分配一个设备：公共类DistributedPredictor {私有Predictor[]嵌套；私有int[]计数器；// .公共DistributedPredictor(String modelPath，int numDevices，int numThreadsPerDevice) {嵌套=新PredictornumDevices；计数器=新intnumDevices；for (int i= 0；i< nested.length；i++) { nestedi =新预测器(modelPath，i，numDevices，numThreadsPerDevice)；}}公共预测(数据数据){ int i= acquirePredictorIndex()；预测结果=nestedi.predict(数据)；releasePredictorIndex(i)；返回结果；}私有同步int acquirePredictorIndex() { int i=argmin(计数器)；+= 1；返回i；}私有同步空releasePredictorIndex(int i) { class -= 1；}公共类预测器{私有会话；公共预测器(String modelPath，int deviceIdx，int numDevices，int numThreadsPerDevice) { GPUOptions gpuOptions = GPUOptions.newBuilder() .setVisibleDeviceList("“+ deviceIdx) .setAllowGrowth(true) .build()；ConfigProto config = ConfigProto.newBuilder() .setGpuOptions(gpuOptions) .setInterOpParallelismThreads(numDevices * numThreadsPerDevice) .build()；byte[] graphDef = Files.readAllBytes(Paths.get(modelPath))；图图=新图()；graph.importGraphDef(graphDef)；this.session =新会话(图，config.toByteArray())；}公共预测预测(数据数据){ // .}} 这种方法一目了然。然而，会话偶尔会忽略setVisibleDeviceList选项，所有这些都会导致内存不足的第一台设备崩溃.
使用tf.device()规范在python中以多塔式方式构建模型。在java方面，在共享会话中为不同的Predictor提供不同的塔。我觉得自己笨手笨脚，愚蠢至极。

更新:正如@ash所提议的那样，还有另一个选择：

通过修改现有图的定义(graphDef)，为现有图的每个操作分配一个适当的设备。要做到这一点，可以修改方法2中的代码：公共类预测器{私有会话；公共预测器(String modelPath，int deviceIdx，int numDevices，int numThreadsPerDevice) { byte[] graphDef = Files.readAllBytes(Paths.get(modelPath))；graphDef = setGraphDefDevice(graphDef，deviceIdx)图=新图()；graph.importGraphDef(graphDef)；ConfigProto config = ConfigProto.newBuilder() graphDef(真)；builder.getNodeBuilder(i).setDevice(deviceString)；= this.session =新会话(图，config.toByteArray())；}私有静态byte[] setGraphDefDevice(byte[] graphDef，int deviceIdx)抛出InvalidProtocolBufferException { String deviceString = String.format("/gpu:%d"，deviceIdx)；GraphDef.Builder builder = GraphDef.parseFrom(graphDef).toBuilder()；for (int i= 0；i< builder.getNodeCount()；deviceIdx){}返回builder.build().toByteArray()；}公共预测预测(数据数据){ // .} 就像其他提到的方法一样，这种方法不能让我在设备之间手动分发数据。但至少它工作稳定，易于实现。总的来说，这看起来是一种(几乎)正常的技术。

是否有一种优雅的方法来使用tensorflow java API来完成这样的基本工作？任何想法都将不胜感激。

java

tensorflow

multi-gpu

回答 2

Stack Overflow用户

发布于 2017-12-20 23:54:08

简而言之:有一个解决办法，在这里，每个GPU都有一个会话。

详细信息：

一般的流程是TensorFlow运行时尊重图中为操作指定的设备。如果没有为一个操作指定任何设备，那么它将根据一些启发式方法“放置”它。这些启发目前的结果是“如果GPU可用且有GPU内核，则在GPU上的位置操作为0”(如果您感兴趣的话，可以使用Placer::Run)。

我认为您所要求的是对TensorFlow的合理特性请求--能够将序列化图中的设备视为“虚拟”设备，以便在运行时映射到一组"phyiscal“设备，或者设置”默认设备“。此功能目前不存在。向ConfigProto添加这样的选项是您可能想要提交特性请求的东西。

我可以在过渡期间提出一个解决办法。首先是对你提出的解决方案的一些评论。

你的第一个想法肯定会奏效，但正如你所指出的，这是很麻烦的。
在ConfigProto中使用ConfigProto进行设置并不完全可行，因为这实际上是一个每个进程的设置，并且在流程中创建第一个会话之后将被忽略。这当然没有文档化以及它应该是(有点不幸的是，这出现在每个会话的配置)。然而，这解释了为什么您的建议在这里不起作用，以及为什么您仍然看到一个GPU正在使用。
这个能行的。

另一种选择是得到不同的图形(在不同的GPU上显式地放置操作)，结果每个GPU都有一个会话。这样的东西可以用来编辑图形，并为每个操作显式地分配一个设备：

public static byte[] modifyGraphDef(byte[] graphDef, String device) throws Exception {
  GraphDef.Builder builder = GraphDef.parseFrom(graphDef).toBuilder();
  for (int i = 0; i < builder.getNodeCount(); ++i) {
    builder.getNodeBuilder(i).setDevice(device);
  }
  return builder.build().toByteArray();
}

在此之后，您可以使用以下内容创建每个GPU的Graph和Session：

final int NUM_GPUS = 8;
// setAllowSoftPlacement: Just in case our device modifications were too aggressive
// (e.g., setting a GPU device on an operation that only has CPU kernels)
// setLogDevicePlacment: So we can see what happens.
byte[] config =
    ConfigProto.newBuilder()
        .setLogDevicePlacement(true)
        .setAllowSoftPlacement(true)
        .build()
        .toByteArray();
Graph graphs[] = new Graph[NUM_GPUS];
Session sessions[] = new Session[NUM_GPUS];
for (int i = 0; i < NUM_GPUS; ++i) {
  graphs[i] = new Graph();
  graphs[i].importGraphDef(modifyGraphDef(graphDef, String.format("/gpu:%d", i)));
  sessions[i] = new Session(graphs[i], config);    
}

然后使用sessions[i]在GPU #i上执行图形。

希望这能有所帮助。

票数 7

Stack Overflow用户

发布于 2020-08-06 18:12:39

在python中，可以这样做：

def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""
    with tf.gfile.GFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    return graph_def

trt_graph1 = get_frozen_graph('/home/ved/ved_1/frozen_inference_graph.pb')

with tf.device('/gpu:1'):
    [tf_input_l1, tf_scores_l1, tf_boxes_l1, tf_classes_l1, tf_num_detections_l1, tf_masks_l1] = tf.import_graph_def(trt_graph1, 
                    return_elements=['image_tensor:0', 'detection_scores:0', 
                    'detection_boxes:0', 'detection_classes:0','num_detections:0', 'detection_masks:0'])
    
tf_sess1 = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))

trt_graph2 = get_frozen_graph('/home/ved/ved_2/frozen_inference_graph.pb')

with tf.device('/gpu:0'):
    [tf_input_l2, tf_scores_l2, tf_boxes_l2, tf_classes_l2, tf_num_detections_l2] = tf.import_graph_def(trt_graph2, 
                    return_elements=['image_tensor:0', 'detection_scores:0', 
                    'detection_boxes:0', 'detection_classes:0','num_detections:0'])
    
tf_sess2 = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47799972

复制

相似问题

问Tensorflow Java多GPU推理
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tensorflow Java多GPU推理EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tensorflow Java多GPU推理
EN