首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Kubernetes自动分配器- NotTriggerScaleUp‘pod没有触发扩展(如果添加了一个新节点,它就不适合)。

Kubernetes自动分配器- NotTriggerScaleUp‘pod没有触发扩展(如果添加了一个新节点,它就不适合)。
EN

Stack Overflow用户
提问于 2019-09-20 19:31:45
回答 4查看 24.7K关注 0票数 13

我希望每个节点运行一个“作业”,一次在一个节点上运行一个荚。

  • 我已经安排了很多工作
  • 我现在有一大堆待处理的吊舱
  • 我希望这些挂起的吊舱现在触发一个节点扩展事件(这是而不是)。

非常类似于这个问题(由我自己制作):Kubernetes报告说,"pod没有触发扩展(如果添加了一个新节点,它就不合适了)“,即使会这样呢?

但是,在这种情况下,它确实应该适合于一个新节点。

我如何诊断为什么Kubernetes已经确定节点缩放事件是不可能的?

我的工作yaml:

代码语言:javascript
运行
复制
apiVersion: batch/v1
kind: Job
metadata:
  name: example-job-${job_id}
  labels:
    job-in-progress: job-in-progress-yes
spec:
  template:
    metadata:
      name: example-job-${job_id}
    spec:
      # this bit ensures a job/container does not get scheduled along side another - 'anti' affinity
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: kubernetes.io/hostname 
            labelSelector:
              matchExpressions:
              - key: job-in-progress
                operator: NotIn
                values:
                - job-in-progress-yes
      containers:
      - name: buster-slim
        image: debian:buster-slim
        command: ["bash"]
        args: ["-c", "sleep 60; echo ${echo_param}"]
      restartPolicy: Never

自动分号日志:

代码语言:javascript
运行
复制
I0920 19:27:58.190751       1 static_autoscaler.go:128] Starting main loop
I0920 19:27:58.261972       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: []
I0920 19:27:58.262003       1 aws_manager.go:152] Refreshed ASG list, next refresh after 2019-09-20 19:28:08.261998185 +0000 UTC m=+302.102284346
I0920 19:27:58.262092       1 static_autoscaler.go:261] Filtering out schedulables
I0920 19:27:58.264212       1 static_autoscaler.go:271] No schedulable pods
I0920 19:27:58.264246       1 scale_up.go:262] Pod default/example-job-21-npv6p is unschedulable
I0920 19:27:58.264252       1 scale_up.go:262] Pod default/example-job-28-zg4f8 is unschedulable
I0920 19:27:58.264258       1 scale_up.go:262] Pod default/example-job-24-fx9rd is unschedulable
I0920 19:27:58.264263       1 scale_up.go:262] Pod default/example-job-6-7mvqs is unschedulable
I0920 19:27:58.264268       1 scale_up.go:262] Pod default/example-job-20-splpq is unschedulable
I0920 19:27:58.264273       1 scale_up.go:262] Pod default/example-job-25-g5mdg is unschedulable
I0920 19:27:58.264279       1 scale_up.go:262] Pod default/example-job-16-wtnw4 is unschedulable
I0920 19:27:58.264284       1 scale_up.go:262] Pod default/example-job-7-g89cp is unschedulable
I0920 19:27:58.264289       1 scale_up.go:262] Pod default/example-job-8-mglhh is unschedulable
I0920 19:27:58.264323       1 scale_up.go:304] Upcoming 0 nodes
I0920 19:27:58.264370       1 scale_up.go:420] No expansion options
I0920 19:27:58.264511       1 static_autoscaler.go:333] Calculating unneeded nodes
I0920 19:27:58.264533       1 utils.go:474] Skipping ip-10-0-1-118.us-west-2.compute.internal - no node group config
I0920 19:27:58.264542       1 utils.go:474] Skipping ip-10-0-0-65.us-west-2.compute.internal - no node group config
I0920 19:27:58.265063       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-25-g5mdg", UID:"d2e0e48c-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7256", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265090       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-8-mglhh", UID:"c7d3ce78-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7267", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265101       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-6-7mvqs", UID:"c6a5b0e4-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7273", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265110       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-20-splpq", UID:"cfeb9521-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7259", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265363       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-21-npv6p", UID:"d084c067-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7275", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265384       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-16-wtnw4", UID:"ccbe48e0-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7265", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265490       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-28-zg4f8", UID:"d4afc868-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7269", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265515       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-24-fx9rd", UID:"d24975e5-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7271", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0920 19:27:58.265685       1 static_autoscaler.go:360] Scale down status: unneededOnly=true lastScaleUpTime=2019-09-20 19:23:23.822104264 +0000 UTC m=+17.662390361 lastScaleDownDeleteTime=2019-09-20 19:23:23.822105556 +0000 UTC m=+17.662391653 lastScaleDownFailTime=2019-09-20 19:23:23.822106849 +0000 UTC m=+17.662392943 scaleDownForbidden=false isDeleteInProgress=false
I0920 19:27:58.265910       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"example-job-7-g89cp", UID:"c73cfaea-dbd9-11e9-a9e2-024e7db9d360", APIVersion:"v1", ResourceVersion:"7263", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
EN

回答 4

Stack Overflow用户

发布于 2019-09-22 17:46:55

我在自动分词器上定义了错误的参数。

我不得不修改node-group-auto-discoverynodes参数。

代码语言:javascript
运行
复制
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=default
        - --scan-interval=25s
        - --scale-down-unneeded-time=30s
        - --nodes=1:20:terraform-eks-demo20190922161659090500000007--terraform-eks-demo20190922161700651000000008
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/example-job-runner
        - --logtostderr=true
        - --stderrthreshold=info
        - --v=4

在安装集群自动分配器时,仅仅应用示例配置是不够的,例如:

代码语言:javascript
运行
复制
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

正如用户指南中所记录的那样,该配置在node-group-auto-discovery值中为您的ek集群名称设置了占位符,您必须在应用之前替换它,或者在部署之后更新它。

票数 7
EN

Stack Overflow用户

发布于 2020-11-18 06:34:01

我也碰到这个了。我没见过你认为它会在哪里有很好的记录。下面是对主README.md的详细说明

AWS -使用标记实例组的自动发现 自动发现发现如下所示的ASGs标记,并根据ASG中指定的最小和最大大小自动管理它们。仅限cloudProvider=aws

  • 默认情况下,用匹配.Values.autoDiscovery.tags的键标记ASG:k8s.io/cluster-autoscaler/enabledk8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>
  • 验证IAM权限
  • 设置autoDiscovery.clusterName=<YOUR CLUSTER NAME>
  • 设置awsRegion=<YOUR AWS REGION>
  • 如果想要设置awsAccessKeyID=<YOUR AWS KEY ID>awsSecretAccessKey=<YOUR AWS SECRET KEY>,请设置直接使用AWS凭据而不是实例角色

$ helm install my-release autoscaler/cluster-autoscaler-chart --set autoDiscovery.clusterName=<CLUSTER NAME>

我的问题是,我没有指定这两个标记,而只是指定了k8s.io/cluster-autoscaler/enabled标记。这是有意义的,因为我认为,如果您有多个k8s集群在同一个帐户,集群自动分配器将需要知道哪个ASG实际规模。

票数 3
EN

Stack Overflow用户

发布于 2021-07-09 02:46:51

我错误地将它们添加为节点标签k8s.io/cluster-autoscaler/enabledk8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>

但是它们实际上应该是工作组中节点上的标记。

具体来说,如果你在Terraform中使用AWS EKS模块-

代码语言:javascript
运行
复制
  workers_group_defaults = {
    tags = [{
        key                 = "k8s.io/cluster-autoscaler/enabled"
        value               = "TRUE"
        propagate_at_launch = true
      },{
        key                 = "k8s.io/cluster-autoscaler/${var.cluster_name}"
        value               = "owned"
        propagate_at_launch = true
      }]
  }
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58034203

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档