TencentOS Server CLIP

本指导适用于在 TencentOS Server 3 上使用 OpenVINO 推理框架运行 CLIP 模型的官方 Demo，以 Docker 方式启动。
OpenVINO 环境准备
由于从 OpenVINO 官方 GitHub 仓库使用 Dockerfile 构建 OpenVINO 镜像速度较慢，这里我们通过直接拉取 Dockerhub 上 OpenVINO 的镜像包构建OpenVINO 开发环境。
由于直接从 DockerHub openvino 使用 Docker pull 拉取镜像会由于访问限制问题拉取失败，这里我们通过腾讯云 Docker 镜像源加速镜像下载。
配置腾讯云镜像加速源
1. 打开系统中 /etc/docker/daemon.json 配置文件：
vim /etc/docker/daemon.json
2. 按 i 切换至编辑模式，添加以下内容，请注意 json 代码的格式：
{
   "registry-mirrors": [
   "https://mirror.ccs.tencentyun.com"
  ]
}
添加完成后格式应类似于：
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "registry-mirrors": [
    "https://mirror.ccs.tencentyun.com"
  ]
}
随后按 ESC 即可退出编辑模式，再输入 !wq 即可保存并退出 vim 环境。
3. 此时 Docker daemon 会被关闭，需要重启 Docker。
sudo systemctl restart docker
注意：
如果此时出现以下错误：
Job for docker.service failed because the control process exited with error code。
See "systemctl status docker.service" and "journalctl -xe" for details。
说明添加的 registry-mirrors 没有按照 json 文件的格式添加，请重新检查格式是否正确，添加正确后请再次重启 Docker。
4. 查看 Docker 状态：
sudo docker info
此时 Clinet 和 Server 都正确运行，且出现 Registry Mirrors 腾讯云镜像源内容，则说明配置成功。
﻿
从 Dockerhub 拉取 OpenVINO 镜像
这里我们选择 ubuntu22_dev，且拉取版本为最新版本。
docker run -it -e HF_ENDPOINT="https://hf-mirror.com" --name openvino openvino/ubuntu22_dev:latest /bin/bash
运行模型
模型环境准备
1. 此时应该在 /opt/intel/openvino_2024.2.0.15519/ 路径下，创建 Demo 文件夹来存放模型代码。
mkdir -p demo/CLIP
cd demo/CLIP
2. 将 pip 换为国内清华源以加快下载速度。
#将pip换成清华源
#设为默认，永久有效
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
3. 安装运行 CLIP 模型必要的环境（大部分运行模型需要的包已经在镜像里存在了，这里是检查一遍比较重要的包，确保后续运行不会出错）：
pip install --extra-index-url https://download.pytorch.org/whl/cpu "gradio>=4.19" "openvino>=2023.1.0" "transformers[torch]>=4.30" "datasets" "nncf>=2.6.0" "torch>=2.1" Pillow "matplotlib>=3.4"
本指导使用的 CLIP 的 backbone 为 clip-vit-base-patch16，对于给定的图像并输入 N 条 Prompts 进行 zero-shot 推理，确定图像属于概率最高的一个 label，进行 CLIP zero-shot 图像分类任务。
下载模型权重地址换源
由于中国大陆无法下载 Hugging Face 网站模型，首先需要对下载网站换源，使用国内镜像网站的 HF-Mirror 模型。
说明：
如果 docker run 的时候加上了-e HF_ENDPOINT="https://hf-mirror.com"，则此步可以跳过。
#单次有效，退出容器且暂停容器运行后失效，再次重启进入容器需重新输入此条命令
export HF_ENDPOINT="https://hf-mirror.com"
注意：
这里使用 echo 'export HF_ENDPOINT="https://hf-mirror.com"' >> ~/.bashrc 的命令仍然会导致下载失败，请勿使用。
创建模型代码
1. 在 demo/CLIP 文件夹下创建 clip.py 文件，并写入以下代码，这里我们使用原始 Pytorch 和 OpenVINO 两种方式实现图像分类推理：
from transformers import CLIPProcessor, CLIPModel
import requests
from pathlib import Path
from PIL import Image
import os
import openvino as ov
from scipy.special import softmax
import numpy as np


# load pre-trained model
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch16")
# load preprocessor for model input
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")


# download sample
sample_path = Path("data/coco.jpg")
if os.path.exists(sample_path):
    print("sample exists.")
else:
    print("download sample.")
    sample_path.parent.mkdir(parents=True, exist_ok=True)
    r = requests.get("https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco.jpg")


    with sample_path.open("wb") as f:
        f.write(r.content)


image = Image.open(sample_path)


input_labels = [
    "cat",
    "dog",
    "wolf",
    "tiger",
    "man",
    "horse",
    "frog",
    "tree",
    "house",
    "computer",
]
text_descriptions = [f"This is a photo of a {label}" for label in input_labels]
print(text_descriptions)
inputs = processor(text=text_descriptions, images=[image], return_tensors="pt", padding=True)


results = model(**inputs)
logits_per_image = results["logits_per_image"]  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1).detach().numpy()  # we can take the softmax to get the label probabilities
print("Label probs using Pytorch:", probs)
print("Pytorch prediction result:", input_labels[np.argmax(probs)])


# use OpenVINO to run the CLIP model
fp16_model_path = Path("clip-vit-base-patch16.xml")
model.config.torchscript = True


# convert model to openvino format
if not fp16_model_path.exists():
    ov_model = ov.convert_model(model, example_input=dict(inputs))
    ov.save_model(ov_model, fp16_model_path)


# create OpenVINO core object instance
core = ov.Core()
# check running device
device = "CPU"


# compile model for loading on device
compiled_model = core.compile_model(fp16_model_path, device)
# run inference on preprocessed data and get image-text similarity score
ov_logits_per_image = compiled_model(dict(inputs))[0]
# perform softmax on score
probs = softmax(ov_logits_per_image, axis=1)
print("Label probs using OpenVINO:", probs)
print("OpenVINO prediction result:", input_labels[np.argmax(probs)])
运行模型：
python3 clip.py
2. 首先会下载模型，随后会下载测试的图像，保存在 data/coco.jpg 下，图像如下图所示：
﻿
注意：
第一次下载图像可以会出现下载失败，这是由于网络原因导致的，多尝试几次即可成功下载测试图像。
3. 接下来，我们定义了10个标签，分别是：
"cat",
"dog",
"wolf",
"tiger",
"man",
"horse",
"frog",
"tree",
"house",
"computer",
并将10个 labels 分别带入 prompts 模板 This is a photo of a {label} 并作为最终 prompts 输出进 CLIP text encoder 生成 embeddings，最终与模型图像 vector 点乘得到 logits 预测结果。
得到 Pytorch 结果后，本示例会将模型转换为 OpenVINO 的形式再次推理一遍。请注意由于 OpenVINO 不支持使用 Nvidia GPU 而只能使用 Intel GPU 进行推理，所以本示例在 CPU 上推理模型。
模型输出
可以看到输入给 CLIP text encoder 的 prompts 全部为：
['This is a photo of a cat',
 'This is a photo of a dog',
 'This is a photo of a wolf',
 'This is a photo of a tiger',
 'This is a photo of a man',
 'This is a photo of a horse',
 'This is a photo of a frog',
 'This is a photo of a tree',
 'This is a photo of a house',
 'This is a photo of a computer']
输出的 Pytorch 推理和 OpenVINO 推理的结果如下（参考）：
#Pytorch推理结果
Label probs using Pytorch: [[6.9000979e-04 9.8858720e-01 3.0764646e-04 1.2136426e-04 7.0234956e-03
  6.7098060e-04 2.5553041e-04 1.6874878e-04 1.8124736e-03 3.6242907e-04]]
Pytorch prediction result: dog
#OpenVINO推理结果
Label probs using OpenVINO: [[7.1112876e-04 9.8869681e-01 3.1731726e-04 1.3581685e-04 6.7636110e-03
  6.6217897e-04 2.3762971e-04 1.6567315e-04 1.9223638e-03 3.8736628e-04]]
OpenVINO prediction result: dog
可以看到无论是 Pytorch 还是 OpenVINO，推理图像的结果均为 dog。
注意事项
说明：
由于 OpenCloudOS 是 TencentOS Server 的开源版本，理论上上述文档当中的所有操作同样适用于 OpenCloudOS。
参考文档
﻿腾讯云 Docker 镜像源配置﻿
﻿DockerHub openvino/ubuntu22_dev 镜像﻿
﻿openvino clip-zero-shot-classification-with-output Demo﻿
﻿openvino Interactive Tutorials﻿
﻿Hugging Face openai/clip-vit-base-patch16 模型﻿
﻿Hugging Face 镜像网站﻿
﻿
﻿
CLIP

本页目录：

OpenVINO 环境准备

配置腾讯云镜像加速源

从 Dockerhub 拉取 OpenVINO 镜像

运行模型

模型环境准备

下载模型权重地址换源

创建模型代码

模型输出

注意事项

参考文档