一些kubernetes的开发/实现/使用技巧-2

查看某种资源

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
或者
kubectl proxy 在页面上看

Controller 逻辑(JobController为例)

JobController的实现逻辑比较简单,用它来示例 Controller的实现方式

serviceaccount_controller 和 tokens_controller

  • serviceaccount_controller: 让每个空间都有一个默认 serviceaccount, 比如配置的 "default"
  • tokens_controller: 让 serviceaccount 有对应 token: secret

kubernetes 可配置特性

默认是否打开,以及当前成熟程度

pkg/features/kube_features.go

kubectl code 位置问题

kubectl中 auth/convert/cp/get 在 k8s.io/kubernetes/pkg 下面,其余代码在 k8s.io/kubectl 下面,这个是因为以前都在 k8s.io/kubernetes/pkg, 后来渐渐 move to staging目录下,还没移完

kubectl 实现方式

kubectl 的核心在 vendor/k8s.io/cli-runtime,其中比较重要的是 vendor/k8s.io/cli-runtime/pkg/resource/builder.go

构建builder->设置builder参数->Do设置vistors->Info取回并修饰结果

type RESTClientGetter interface {
	// ToRESTConfig returns restconfig
	ToRESTConfig() (*rest.Config, error)
	// ToDiscoveryClient returns discovery client
	// DiscoveryInterface holds the methods that discover server-supported API groups,
    // versions and resources.
	ToDiscoveryClient() (discovery.CachedDiscoveryInterface, error)
	// ToRESTMapper returns a restmapper
	// RESTMapper allows clients to map resources to kind, and map kind and version
    // to interfaces for manipulating those objects. It is primarily intended for
    // consumers of Kubernetes compatible REST APIs as defined in docs/devel/api-conventions.md.
	ToRESTMapper() (meta.RESTMapper, error)
	// ToRawKubeConfigLoader return kubeconfig loader as-is
	ToRawKubeConfigLoader() clientcmd.ClientConfig
}

// Result contains helper methods for dealing with the outcome of a Builder.
type Result struct {
	err     error
	visitor Visitor

	sources            []Visitor
	singleItemImplied  bool
	targetsSingleItems bool

	mapper       *mapper
	ignoreErrors []utilerrors.Matcher

	// populated by a call to Infos
	info []*Info
}

kubelet的核心组件

image.png
image

图片来自 https://feisky.gitbooks.io/kubernetes/components/kubelet.html (上图中kubelet之后还有ContainerManager(名字容易混淆)会设置cgroup,device resource之类的信息,然后才会调用genericRuntimeManager)

  • PodWorkers: podWorkers handle syncing Pods in response to events.
  • kubepod.Manager: podManager is a facade that abstracts away the various sources of pods this Kubelet services.
  • eviction.Manager: Needed to observe and respond to situations that could impact node stability
  • kubecontainer.ContainerCommandRunner: run in container, 即 exec in container
  • cadvisor: 监控
  • dnsConfigurer: setting up DNS resolver configuration when launching pods
  • VolumePluginMgr: Volume plugins.
  • probeManager/livenessManager: Handles container probing/ Manages container health check results.
  • kubecontainer.ContainerGC: Policy for handling garbage collection of dead containers.
  • images.ImageGCManager: Manager for image garbage collection.
  • logs.ContainerLogManager: Manager for container logs.
  • secret.Manager: Secret manager
  • configmap.Manager: ConfigMap manager.
  • certificate.Manager: Handles certificate rotations.
  • status.Manager: Syncs pods statuses with apiserver; also used as a cache of statuses.
  • volumemanager.VolumeManager: attach/mount/unmount/detach volumes for pods
  • cloudprovider.Interface
  • cloudresource.SyncManager
  • kubecontainer.Runtime: Container runtime, GetPods/SyncPod/KillPod/GetPodStatus/ImageService....
  • kubecontainer.StreamingRuntime: GetExec/GetAttach/GetPortForward
  • RuntimeService:
    • ContainerManager(Create/Start/Stop/List/Exec...Container)
    • PodSandboxManager(Run/Stop/Remove..PodSandbox)
    • ContainerStatsManager
  • PodLifecycleEventGenerator: Generates pod events.
  • oomwatcher.Watcher
  • cm.ContainerManager: Start/SystemCgroupsLimit/GetNodeConfig/GetMountedSubsystems/GetQOSContainersInfo...
  • pluginmanager.PluginManager

kubelet的入口线程

kubelet.go

  • ListenAndServe/ListenAndServeReadOnly: server 10250/10255
  • ListenAndServePodResources: a gRPC server to serve the PodResources service
  • For serviceIndexer/nodeIndexer: get local cache for service and node object
  • containerGC/imageManager.GarbageCollection: 定期 GarbageCollect, call kubeGenericRuntimeManager.containerGC evictContainers/evictSandboxes/evictPodLogsDirectories / realImageGCManager.GarbageCollect
  • pluginManager.Run: CSIPlugin/DevicePlugin
  • cloudResourceSyncManager: sync node address
  • volumeManager: runs a set of asynchronous loops that figure out which volumes need to be attached/mounted/unmounted/detached based on the pods scheduled on this node and makes it so.
  • syncNodeStatus/fastStatusUpdateOnce/nodeLeaseController: updateNodeStatus 两种上报方式,lease轻量不易因为集群数据量过大失败
  • updateRuntimeUp: every 5s , initializing the runtime dependent modules when the container runtime first comes up
  • podKiller: every 1s, Start a goroutine responsible for killing pods (that are not properly handled by pod workers).
syncLoopIteration
// Arguments:
// 1.  configCh:       a channel to read config events from, 来自http/status/apiserver
// 2.  handler:        the SyncHandler to dispatch pods to, 同步状态
// 3.  syncCh:         a channel to read periodic sync events from
// 4.  housekeepingCh: a channel to read housekeeping events from
// 5.  plegCh:         a channel to read PLEG updates from, 容器状态变化ContainerStarted/Died/Removed/..

cgroup 结构

https://zhuanlan.zhihu.com/p/38359775

# ubuntu 16.04; kubernetes v1.10.5
ubuntu@VM-0-12-ubuntu:~$ systemd-cgls
Control group /:
-.slice
├─init.scope
│ └─1 /sbin/init
├─system.slice
│ ├─avahi-daemon.service
│ │ ├─1268 avahi-daemon: running [VM-0-12-ubuntu.local
│ │ └─1283 avahi-daemon: chroot helpe
| | -- 略
│ ├─dockerd.service
│ │ ├─ 5134 /usr/bin/dockerd --config-file=/etc/docker/daemon.json
│ │ ├─ 5143 docker-containerd --config /var/run/docker/containerd/containerd.toml
│ │ └─29537 docker-containerd-shim -namespace moby -workdir /data/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/303a0718c84995350d835f6e2d17036
| | | 略
│ ├─accounts-daemon.service
│ │ └─1262 /usr/lib/accountsservice/accounts-daemon
| | --略
│ ├─NetworkManager.service
│ │ └─1287 /usr/sbin/NetworkManager --no-daemon
│ ├─kubelet.service
│ │ └─5239 /usr/bin/kubelet --cluster-dns=10.15.255.254 --network-plugin=cni --kube-reserved=cpu=80m,memory=1319Mi --cloud-config=/etc/kubernetes/qcloud.conf 
│ ├─rsyslog.service
│ │ └─1251 /usr/sbin/rsyslogd -n
| | 略
│ └─acpid.service
│   └─1293 /usr/sbin/acpid
├─user.slice
│ └─user-500.slice
│   ├─session-129315.scope
│   │ ├─27862 sshd: ubuntu [priv] 
│   └─user@500.service
│     └─init.scope
│       ├─27870 /lib/systemd/systemd --user
│       └─27871 (sd-pam)  
└─kubepods
  ├─burstable
  │ ├─pod5645ed58-e98f-11e9-8443-52540087514c
  │ │ ├─1f8f76dacb8334bd8d8ab2a7432d2cc250286ca6b5b73ab6dca9a845b77a3a09
  │ │ │ └─8958 /configmap-reload --webhook-url=http://localhost:9090/-/reload --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
  └─besteffort
    ├─pod3cf3ae0d-b7f4-11e9-8443-52540087514c
    │ ├─fde2178c5fa634206c2c86756c107c3de2828d2f90e2ea4c6a3b57f50c25267c
    │ │ └─5435 /pause
    │ └─5b4082efeb73ad102cc3fea33ff4c931c042a7120f0cd5277d46660aedffffde
    │   ├─ 5663 sh /install-cni.sh
    │   └─20347 sleep 3600

APIserver 结构

一个不错的参考:https://note.youdao.com/ynoteshare1/index.html?id=63f58c5e98634c8b3df9da2b024aacd5&type=note

image.png

重要流程

  • CreateKubeAPIServer
    • completedConfig.InstallLegacyAPI: api/all 和 api/legacy,分别控制全部和遗留 API
    • completedConfig.InstallAPIs
      • apiGroupInfo=restStorageBuilder.NewRESTStorage: 比较重要的元素是 VersionedResourcesStorageMap mapstringmapstringrest.Storage: {"v1beta1":{"deployments":deploymentStorage.Deployment}}
        • 以"app"为例: if v1enable: storageMap=RESTStorageProvider(storage_app).v1Storage
          • deploymentStorage = deploymentstore.NewStorage, storage"deployments" = deploymentStorage.Deployment; deploymentStorage 里面是 XXXREST元素, XXXREST元素的解释见下面
      • GenericAPIServer.InstallAPIGroups
        • s.installAPIResources: 核心安装 API 的方法,建立 api 和 storage的关系
          • apiGroupVersion.InstallREST
            • installer.Install()
              • registerResourceHandlers: 对storage里面所有的path 关联 storage
              • 比如 actions = appendIf(actions, action{"GET", itemPath, nameParams, namer, false}, isGetter)
              • handler = restfulGetResource(getter, exporter, reqScope)
              • route := ws.GET(action.Path).To(handler).Doc(doc)....
        • s.DiscoveryGroupManager.AddGroup
        • s.Handler.GoRestfulContainer.Add(discovery.NewAPIGroupHandler(s.Serializer, apiGroup).WebService())
// NewREST returns a RESTStorage object that will work against deployments.
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, *RollbackREST, error) {
	store := &genericregistry.Store{
		NewFunc:                  func() runtime.Object { return &apps.Deployment{} },
		NewListFunc:              func() runtime.Object { return &apps.DeploymentList{} },
		DefaultQualifiedResource: apps.Resource("deployments"),

		CreateStrategy: deployment.Strategy,
		UpdateStrategy: deployment.Strategy,
		DeleteStrategy: deployment.Strategy,

		TableConvertor: printerstorage.TableConvertor{TableGenerator: printers.NewTableGenerator().With(printersinternal.AddHandlers)},
	}
	options := &generic.StoreOptions{RESTOptions: optsGetter}
	if err := store.CompleteWithOptions(options); err != nil {
		return nil, nil, nil, err
	}

	statusStore := *store
	statusStore.UpdateStrategy = deployment.StatusStrategy
	return &REST{store, []string{"all"}}, &StatusREST{store: &statusStore}, &RollbackREST{store: store}, nil
}

type REST struct {
	*genericregistry.Store
	categories []string
}

genericregistry.Store 定义了 NewList,NewObject,CreateStrategy,UpdateStrategy
核心是 DryRunnableStorage:DryRunnableStorage中的 storage.Interface 是实际对存储的 crud 入口

type DryRunnableStorage struct {
	Storage storage.Interface
	Codec   runtime.Codec
}

Storage 是 Cacher struct {真实stotrage -> etcd3/store}

generic.StoreOptions.RESTOptions决定了后端存储 是 completedConfig.(genericapiserver.CompletedConfig).的一部分 从最上面一层一层传递下来 来自 buildGenericConfig <- createAggregatorConfig  master.config->completedConfig

最终可以发现 generic.RESTOptions.Decorator = genericregistry.StorageWithCacher(cacheSize) 即带 cache 的etcd后端 (EnableWatchCache打开的时候 default true)

cache 的实现在 vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go
下面具体的看这个 cache的实现

apiserver 里面的 cache 实现

watch 为例; user 为 vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go

image.png

动作

处理

Create

etcd3/store:Create

Delete

etcd3/store:Delete

Watch

etcd3/注册 watcher 接受事件, 从 cache

Get

resourceVersion=""时直接去 store get; 否则从 cache 获取(需要wait resourceVersion)

List

和 Get 类似

Debug Etcd

# 下载 etcd 
ETCD_VER=v3.4.0
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz && tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1 && rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

# 配置环境
export ETCDCTL_CERT=/etc/kubernetes/certs/kube-apiserver-etcd-client.crt
export ETCDCTL_KEY=/etc/kubernetes/certs/kube-apiserver-etcd-client.key
export ETCDCTL_CACERT=/etc/kubernetes/certs/kube-apiserver-etcd-ca.crt
export ETCDCTL_ENDPOINTS=https://etcd.cls-4lr4c4wx.ccs.tencent-cloud.com:2379


etcdctl get  "" --prefix=true  --limit=1 # get key and value
etcdctl get  "" --prefix=true --keys-only --limit=100 # get only keys
etcdctl get "/cls-4lr4c4wx/pods" --prefix=true --keys-only  --limit=10 # get pod keys; 这里cls-4lr4c4wx是etcd prefix
etcdctl get "/cls-4lr4c3wx/configmaps" --prefix=true --limit 1 --write-out="json" # 输出为 json

1.16 里面的 watch bookmark event是什么意思

比如一个客户端 watch pod

GET /api/v1/namespaces/test/pods?watch=1&resourceVersion=10245&allowWatchBookmarks=true
---
200 OK
Transfer-Encoding: chunked
Content-Type: application/json
{
  "type": "ADDED",
  "object": {"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "10596", ...}, ...}
}
{
  "type": "BOOKMARK",
  "object": {"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "12746"} }
}

然后 watcher 发生了重启, 有BOOKMARK的 watcher就可以从 resourceVersion=12746 开始继续 watch, 而没有收到 BOOKMARK 的,只能从 resourceVersion=10596 继续 watch,但是其实 10596-12746 起码没有他关心的 event了.

apiserver 是如何实现 Aggregator 的

i56jqed8mj.jpg

aggregator 本身也是一个 controller

image.png

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

编辑于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券