k8s的架构如图:
我们都知道k8s分为master、node,其中:
master 主要有如下几个组件:
node 主要包含以下组件:
这个过程看起来似乎比较简单,但实际生产环境的调度过程中,有很多问题需要考虑:
调度过程分为2个阶段:
代码位置(1.10 ):
https://github.com/kubernetes/kubernetes/tree/release-1.10/pkg/scheduler/algorithm
优选(Priorities)
经过预选策略(Predicates)对节点过滤,获取节点列表,再对符合需求节点列表进行打分,最终选择Pod调度到一个分值最高节点。
最终主机的得分用以下公式计算得出:
finalScoreNode = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + … + (weightn * priorityFuncn)
查看一个node的资源信息:
apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: node-n1
name: node-n1
spec:
externalID: node-n1
status:
addresses:
- address: 10.162.197.135
type: InternalIP
allocatable:
cpu: "8"
memory: 16309412Ki
pods: "110"
capacity:
cpu: "8"
memory: 16411812Ki
pods: "110"
conditions: {...}
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images: {...}
nodeInfo: {...}
查看一个pod的资源信息:
kubectl explain pod.spec
我们看这个pod:
注释:
nodeSelector【将来会被废弃】:将 Pod 调度到特定的 Node 上:
apiVersion: v1
kind: Pod
metadata:
labels:
pod-template-hash: "4173307778"
run: my-pod
name: my-pod
namespace: default
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: my-pod
ports:
- containerPort: 80
protocol: TCP
resources: {}
nodeSelector:
disktype: ssd
node-flavor: s3.large.2
1. podAffinity:让某些 Pod 分布在同一组 Node 上:
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
与nodeAffinity的关键差异:
硬性过滤:排除不具备指定pod的node组
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone
软性:不具备指定pod的node组打低分,降低该组node被选中的几率
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
2. podAntiAffinity:避免某些 Pod 分布在同一组 Node 上:
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
与podAffinity的差异:
3. Taints:避免 Pod 调度到特定 Node 上:
apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: node-n1
name: node-n1
spec:
externalID: node-n1
taints:
- effect: NoSchedule
key: accelerator
timeAdded: null
value: gpu
kubectl taint node node-n1 foo=bar:NoSchedule
kubectl taint node node-n1 foo:NoSchedule-
4. Tolerations:允许 Pod 调度到有特定 taints 的 Node 上:
apiVersion: v1
kind: Pod
metadata:
labels:
run: my-pod
name: my-pod
namespace: default
spec:
containers:
- name: my-pod
image: nginx
tolerations:
- key: accelerator
operator: Equal
value: gpu
effect: NoSchedule
可以无视排斥:
apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: node-n1
name: node-n1
spec:
externalID: node-n1
taints:
- effect: NoSchedule
key: accelerator
timeAdded: null
value: gpu
1. nodeName:将Pod手动调度到特定的 Node 上:
2. DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
selector:
matchLabels:
name: my-daemonset
template:
metadata:
labels:
name: my-daemonset
spec:
containers:
- name: container
image: k8s.gcr.io/pause:2.0
等同于:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deploy
spec:
replicas: <# of nodes>
selector:
matchLabels:
podlabel: daemonset
teplate:
metadata:
labels:
podlabel: daemonset
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: podlabel
operator: In
values:
- daemonset
topologyKey: kubernetes.io/hostname
containers:
- name: container
image: k8s.gcr.io/pause:2.0
查看调度结果:
kubectl get po podname –o wide
查看调度失败原因:
kubectl describe po podname
调度错误列表:
例子:
https://kubernetes.io/blog/2017/03/advanced-scheduling-in-kubernetes/
前面讲的调度是指资源节点的调度,优先级也是指节点的优先级。高优先级的Pod会优先被调度,或者在资源不足低情况牺牲低优先级的Pod,以便于重要的Pod能够得到资源部署。
为了定义Pod优先级,需要先定义PriorityClass对象,该对象没有Namespace限制,官网示例:
然后通过在Pod的spec. priorityClassName中指定已定义的PriorityClass名称即可:
欢迎大家关注个站哟:damon8.cn。
最后介绍新公号:天山六路折梅手,欢迎关注。
ArrayList、LinkedList 你真的了解吗?
浅谈 Java 集合 | 底层源码解析
Spring Cloud Kubernetes之实战一配置管理