计算加速套件 TACO Kit Pytorch 模型推理部署

使用TACO Infer产出优化模型，并且验证模型的性能和正确性符合预期之后，接下来就可以将模型部署在实际生产环境中了。
环境准备
服务器准备：参见 TACO Infer 安装，选购 GPU 机型。
ABI 版本：对于 Pytorch 深度学习框架，TACO Infer 对官方发布的 CXX11 ABI 和 Pre-CXX11 ABI 两个版本的 libtorch 库均进行了支持。
SDK 包安装：在开发部署模型之前，您需要 联系我们 获得 TACO Kit SDK 安装包，然后解压，可以看到安装包中包含多个动态链接库：
[root@60abf692a8a1 (Taco Dev) /root/resnet152]
#ll 1.12.0-lib/
total 1.4G
-rwxr-xr-x 1 root root 3.3M Jan  5 16:10 libnvparsers.so*
-r-xr-xr-x 1 root root  11M Jan  5 16:10 libtacaudio_ops.so*
-r-xr-xr-x 1 root root  19M Jan  5 16:10 libtidy_runtime.so*
-rwxr-xr-x 1 root root 357M Jan  5 16:10 libnvinfer_builder_resource.so.8.5.1*
-rwxr-xr-x 1 root root 465M Jan  5 16:10 libnvinfer.so.8*
-rwxr-xr-x 1 root root 3.3M Jan  5 16:10 libnvparsers.so.8*
lrwxrwxrwx 1 root root   12 Jan  5 16:10 lib_tactorch.so -> _tactorch.so*
-rwxr-xr-x 1 root root 465M Jan  5 16:10 libnvinfer.so*
-rwxr-xr-x 1 root root  42M Jan  5 16:10 libnvinfer_plugin.so*
-rwxr-xr-x 1 root root  42M Jan  5 16:10 libnvinfer_plugin.so.8*
-r-xr-xr-x 1 root root  14M Jan  5 16:10 _tactorch.so*
-rwxr-xr-x 1 root root 2.8M Jan  5 16:10 libnvonnxparser.so.8*
-rwxr-xr-x 1 root root 2.8M Jan  5 16:10 libnvonnxparser.so*
drwxr-xr-x 2 root root 4.0K Jan  5 16:10 ./
drwxr-xr-x 5 root root 4.0K Jan  5 16:39 ../
您可以将所有 TACO 库文件拷贝到系统库目录/usr/lib，以便链接器ld进行链接的时候能够找到它们。
或者您也可以将 TACO 库文件拷贝到其他路径，并在LD_LIBRARY_PATH环境变量中添加库所在路径。
请确保所有的 TACO 库文件位于同一路径下。
推理代码开发
下面以 Torch C++ API 展示优化后模型的加载运行过程：
#include <torch/script.h>
﻿
#include <iostream>
#include <memory>
#include <chrono>
﻿
﻿
int main(int argc, const char* argv[]) {
  // set this env var outside of TencentCloud.
  setenv("TACO_TRIAL_RUN", "true", 1);
﻿
  torch::jit::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load("optimized_dir/optimized_recursive_script_module.pt");
  } catch (const c10::Error& e) {
    std::cerr << "error loading the model\\n";
    return -1;
  }
﻿
  torch::Tensor input = torch::randn({1, 3, 224, 224}).to(torch::Device(torch::kCUDA, 0));
﻿
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(input);
﻿
  // warmup
  for (int i=0; i<10; i++) {
    module.forward(inputs);
  }
﻿
  auto start_time = std::chrono::high_resolution_clock::now();
  for (int i=0; i<10; i++) {
    module.forward(inputs);
  }
  auto end_time = std::chrono::high_resolution_clock::now();
  auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
      end_time - start_time);
﻿
  std::cout << "Duration of resnet152 is: "
            << double(duration.count()) / 10000 << "ms" << std::endl;
﻿
  torch::Tensor output_tensor = module.forward(inputs).toTensor().to(torch::Device(torch::kCPU));
﻿
  auto output_a = output_tensor.accessor<float, 2>();
  for(int i = 0; i < 10; i++) {
    std::cout << output_a[0][i] << std::endl;
  }
}
如以上代码所示，您只需要按照标准的 Torch C++ API 加载经过优化的模型即可，和加载普通的 TorchScript 模型没有任何区别。
编译链接
编译以上代码的时候，需要链接 Taco 提供的两个动态库: _tactorch.so和tidy_runtime.so，此外还需要链接 Pytorch 的相关动态链接库。
编译脚本示例如下所示：
#!/bin/bash
﻿
TORCH_LIB_PATH=/root/venv/taco_dev_pt1.12/lib/python3.8/site-packages/torch/lib
﻿
gcc -std=c++14 \\
    -I./1.12.0-inc/include \\
    -I./1.12.0-inc/include/torch/csrc/api/include \\
    -L./1.12.0-lib -l_tactorch -ltidy_runtime \\
    -L$TORCH_LIB_PATH -ltorch -ltorch_cpu -lc10 \\
    -lstdc++ \\
    -o infer infer.cc
编译完成之后，即得到一个可运行的二进制文件：
[root@60abf692a8a1 /root/resnet152]
#ll
﻿
-rw-r--r--  1 root root 1.4K Jan  5 16:39 infer.cc
-rwxr-xr-x  1 root root 495K Jan  5 16:39 infer*
drwxr-xr-x 28 root root 4.0K Jan  5 16:39 ../
drwxr-xr-x  5 root root 4.0K Jan  5 16:39 ./
推理计算
部署程序编译完成之后，运行便可以加载优化后的模型并进行推理计算：
[root@60abf692a8a1 (Taco Dev) /root/resnet152]
#./infer 
﻿
Duration of resnet152 is: 5.330ms
0.149882
1.17847
0.698387
-0.326189
-0.0360627
-0.938206
-0.905457
-0.272352
-0.280662
-2.14997
可以看到，模型正常加载运行，并输出了推理计算结果。
Pytorch 模型推理部署

本页目录：

环境准备

推理代码开发

编译链接

推理计算