文章/答案/技术大牛

发布

社区首页 >问答首页 >AKS和终结点中的Deployemnt Time out错误卡在“转换”状态

问AKS和终结点中的Deployemnt Time out错误卡在“转换”状态
EN

Stack Overflow用户

提问于 2021-09-06 07:41:13

回答 1查看 164关注 0票数 1

我们正在使用ML studio和azure Kubernetes服务部署170个ML模型，这在下面的文档链接"https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-deploy-azure-kubernetes-service.md"“中提到。

我们正在使用python脚本在自定义环境中训练模型，并在Azure ml服务上注册ML模型。一旦我们注册了模式，我们将使用容器镜像将其部署到AKS上。

在部署ML模型时，我们能够为AKS中的每个节点部署最多10到11个模型。当我们尝试在同一节点上部署模型时，我们会收到部署超时错误，并且会收到以下错误消息。

用于使用python语言在Azure Kubernetes Service中部署模型，示例代码如下。

#  Create an environment and add conda dependencies to it and for this creating our environment and building the custom container image.
     myenv = Environment(name = Deployment_name)
     myenv.python.conda_dependencies = CondaDependencies.create(pip_packages)
    
        
 #  Inference_Conifiguration
     inf_config = InferenceConfig(environment= myenv, entry_script='./Script_file.py')
    
    
 # Deployment_Conifiguration
     deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1, cpu_cores_limit = 2, memory_gb_limit = 2, traffic_percentile = 10)
    
 #  AKS cluster compute target 
     aks_target = ComputeTarget(ws, 'pipeline')
       
    
#  Deploying the model in AKS server
       service = Model.deploy(ws, Deployment_name, model_1, inf_config,
                   deployment_config, aks_target, overwrite=True)
    
        service.wait_for_deployment(show_output=True)

我们还查看了azure文档，我们可以找到aks节点的任何配置或部署设置。

您能为我们提供更多关于“每个部署的模型数量限制为1000个模型(每个容器)”的说明吗?您能就如何增加Azure Kubernetes服务中每个节点中可以部署的ml模型的数量提供见解/反馈吗？谢谢!

azure-machine-learning-service

azure-container-registry

azureml

azure-aks

azure-machine-learning-studio

回答 1

Stack Overflow用户

发布于 2021-09-07 10:52:45

根据该错误，您的PVC似乎有一些问题。

给定Pod的存储必须由PersistentVolume资源调配人员根据请求的存储类别进行资源调配，或者由管理员预先调配。

应该有一个可以动态提供PV的StorageClass，并在volumeClaimTemplates中提到storageClassName，或者需要有一个能够满足PVC值的PV。

volumeClaimTemplates:
  - metadata:
      name: elasticsearch-data-persistent-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 10Gi

参考：pod has unbound immediate PersistentVolumeClaims (repeated 3 times)

也请关注GITHUB的讨论：https://github.com/hashicorp/consul-helm/issues/237

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69070888

复制

相似问题

问AKS和终结点中的Deployemnt Time out错误卡在“转换”状态
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问AKS和终结点中的Deployemnt Time out错误卡在“转换”状态EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问AKS和终结点中的Deployemnt Time out错误卡在“转换”状态
EN