1. LXCFS简介
社区中常见的做法是利用lxcfs来提供容器中的资源可见性。lxcfs 是一个开源的FUSE(用户态文件系统)实现来支持LXC容器,它也可以支持Docker容器。
LXCFS通过用户态文件系统,在容器中提供下列 procfs
的文件。
/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime
比如把宿主机的 /var/lib/lxcfs/proc/memoinfo
文件挂载到Docker容器的/proc/meminfo
位置后。容器中进程读取相应文件内容时,LXCFS的FUSE实现会从容器对应的Cgroup中读取正确的内存限制。从而使得应用获得正确的资源约束设定。
安装 lxcfs 的RPM包
wget https://copr-be.cloud.fedoraproject.org/results/ganto/lxd/epel-7-x86_64/00486278-lxcfs/lxcfs-2.0.5-3.el7.centos.x86_64.rpm
yum install lxcfs-2.0.5-3.el7.centos.x86_64.rpm
启动 lxcfs
lxcfs /var/lib/lxcfs &
测试
docker run -it -m 256m \
-v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
-v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \
-v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \
-v /var/lib/lxcfs/proc/stat:/proc/stat:rw \
-v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \
-v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \
ubuntu:16.04 /bin/bash
结果
[root@node1 ~]# docker run -it -m 256m \
-v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
-v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \
-v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \
-v /var/lib/lxcfs/proc/stat:/proc/stat:rw \
-v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \
-v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \
ubuntu:16.04 /bin/bash
root@6bcd804eef79:/# free -m
total used free shared buff/cache available
Mem: 256 0 254 189 0 254
Swap: 256 0 256
root@6bcd804eef79:/#
我们可以看到total的内存为256MB,配置已经生效。
在kubernetes中使用lxcfs需要解决两个问题:
第一个问题是每个node上都要启动lxcfs,这个简单,部署一个daemonset就可以了。
第二个问题是将lxcfs维护的/proc文件挂载到每个容器中
在集群节点上安装并启动lxcfs,我们将用Kubernetes的方式,用利用容器和DaemonSet方式来运行 lxcfs FUSE文件系统。
git clone https://github.com/denverdino/lxcfs-initializer
cd lxcfs-initializer
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: lxcfs
labels:
app: lxcfs
spec:
selector:
matchLabels:
app: lxcfs
template:
metadata:
labels:
app: lxcfs
spec:
hostPID: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: lxcfs
image: registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs:3.0.4
imagePullPolicy: Always
securityContext:
privileged: true
volumeMounts:
- name: cgroup
mountPath: /sys/fs/cgroup
- name: lxcfs
mountPath: /var/lib/lxcfs
mountPropagation: Bidirectional
- name: usr-local
mountPath: /usr/local
volumes:
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: usr-local
hostPath:
path: /usr/local
- name: lxcfs
hostPath:
path: /var/lib/lxcfs
type: DirectoryOrCreate
kubectl apply -f lxcfs-daemonset.yaml
Kubernetes提供了 Initializer 扩展机制,可以用于对资源创建进行拦截和注入处理,我们可以借助它优雅地完成对lxcfs文件的自动化挂载。
在Kubernetes 1.13中initializers还是一个alpha特性,需要在Kube-apiserver中添加参数开启。
这里使用的是kubernets 1.12,设置方法是一样的:
--enable-admission-plugins="Initializers,NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
--runtime-config=admissionregistration.k8s.io/v1alpha1
--enable-admission-plugins
和--admission-control
互斥,如果同时设置,kube-apiserver启动报错:
error: [admission-control and enable-admission-plugins/disable-admission-plugins flags are mutually exclusive,
enable-admission-plugins plugin "--runtime-config=admissionregistration.k8s.io/v1alpha1" is unknown]
InitializerConfiguration
资源中定义了一组的initializers
。
每个initializer有一个名字和多个规则,规则中是它要作用的资源,例如下面的initializers中只有一个initializer,名称为podimage.example.com
,作用于v1版本的pods。
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: InitializerConfiguration
metadata:
name: example-config
initializers:
# the name needs to be fully qualified, i.e., containing at least two "."
- name: podimage.example.com
rules:
# apiGroups, apiVersion, resources all support wildcard "*".
# "*" cannot be mixed with non-wildcard.
- apiGroups:
- ""
apiVersions:
- v1
resources:
- pods
在kubernets中创建了上面的initializers之后,新建的pod在pending阶段,metadata中会添加一个initializer列表:
metadata:
creationTimestamp: 2019-01-09T08:56:36Z
generateName: echo-7cfbbd7d49-
initializers:
pending:
- name: podimage.example.com
注意需要加上参数--include-uninitialized=true
才能看到处于这个阶段的Pod:
metadata中initializers
列表不为空的Pod,处于正在等待初始化状态,需要部署一个initializer controller
对处于这个阶段中的pod完成初始化后, pod才能退出pending状态。
initializer controller需要自己根据需要实现。
initializer controller监听指定类型的resource,当发现有新创建的resouce创建时,通过检查resource的metadata中的initializer名单,决定是否要对resource进行初始化设置,并且在完成设置之后,需要将对应的initializer名单从resource的metadata中删除,否则resource就一直处于等待初始化设置的状态。
实例:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: lxcfs-initializer-default
namespace: default
rules:
- apiGroups: ["*"]
resources: ["pods"]
verbs: ["initialize", "update", "patch", "watch", "list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: lxcfs-initializer-service-account
namespace: default
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: lxcfs-initializer-role-binding
subjects:
- kind: ServiceAccount
name: lxcfs-initializer-service-account
namespace: default
roleRef:
kind: ClusterRole
name: lxcfs-initializer-default
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
initializers:
pending: []
labels:
app: lxcfs-initializer
name: lxcfs-initializer
spec:
replicas: 1
selector:
matchLabels:
app: lxcfs-initializer
template:
metadata:
labels:
app: lxcfs-initializer
spec:
serviceAccountName: lxcfs-initializer-service-account
containers:
- name: lxcfs-initializer
image: registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs-initializer:0.0.4
imagePullPolicy: Always
args:
- "-annotation=initializer.kubernetes.io/lxcfs"
- "-require-annotation=true"
---
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: InitializerConfiguration
metadata:
name: lxcfs.initializer
initializers:
- name: lxcfs.initializer.kubernetes.io
rules:
- apiGroups:
- "*"
apiVersions:
- "*"
resources:
- pods
首先我们创建了service account lxcfs-initializer-service-account
,并对其授权了 “pod” 资源的查找、更改等权限。然后我们部署了一个名为 “lxcfs-initializer” 的Initializer,利用上述SA启动一个容器来处理对 “pod” 资源的创建,如果deployment中包含 initializer.kubernetes.io/lxcfs
为true
的注释,就会对该应用中容器进行文件挂载
kubectl apply -f lxcfs-initializer.yaml
为其分配256MB内存,并且声明了如下注释 "initializer.kubernetes.io/lxcfs": "true"
web.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
annotations:
"initializer.kubernetes.io/lxcfs": "true"
labels:
app: web
spec:
containers:
- name: web
image: httpd:2.4.32
imagePullPolicy: Always
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "500m"
kubectl apply -f web.yaml
kubectl exec web-7f6bc6797c-rb9sk free
total used free shared buffers cached
Mem: 262144 2876 259268 2292 0 304
-/+ buffers/cache: 2572 259572
Swap: 0 0 0