大魏:从今天开始,大魏分享将会转载Openshift源码分析系列。文章为红帽合作伙伴书写,大魏已经得到了转载授权。
openshift底层是通过kubelet来管理pod,kubelet通过CNI插件来配置pod网络.openshift node节点在启动的时会在一个goroutine中启动kubelet, 由kubelet来负责pod的管理工作。
本文主要从源码的角度入手,简单分析在openshift环境下kubelet是如何通过调用openshift sdn插件来配置pod网络的.
我们先看一张pod网络配置的流程图,如下:
接下来根据流程图对各部分代码进行分析:
当kubelet接受到pod创建请求时,会调用底层的docker来创建pod。调用的入口位于pkg/kubelet/kuberuntime/kuberuntime_manager.go#L643,代码如下:
1podSandboxID, msg, err = m.createPodSandbox(pod, podContainerChanges.Attempt)
如上所示,kubelet是通过调用createPodSandbox这个方法来创建pod,该方法的定义位于pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L35 内容如下:
1// createPodSandbox creates a pod sandbox and returns (podSandBoxID, message, error).
2func (m *kubeGenericRuntimeManager) createPodSandbox(pod *v1.Pod, attempt uint32) (string, string, error) {
3 podSandboxConfig, err := m.generatePodSandboxConfig(pod, attempt)
4 if err != nil {
5 message := fmt.Sprintf("GeneratePodSandboxConfig for pod %q failed: %v", format.Pod(pod), err)
6 glog.Error(message)
7 return "", message, err
8 }
9
10 // Create pod logs directory
11 err = m.osInterface.MkdirAll(podSandboxConfig.LogDirectory, 0755)
12 if err != nil {
13 message := fmt.Sprintf("Create pod log directory for pod %q failed: %v", format.Pod(pod), err)
14 glog.Errorf(message)
15 return "", message, err
16 }
17
18 podSandBoxID, err := m.runtimeService.RunPodSandbox(podSandboxConfig)
19 if err != nil {
20 message := fmt.Sprintf("CreatePodSandbox for pod %q failed: %v", format.Pod(pod), err)
21 glog.Error(message)
22 return "", message, err
23 }
24
25 return podSandBoxID, "", nil
26}
该方法首先会调用generatePodSandboxConfig来生成pod sandbox配置文件,然后调用MkdirAll方法来创建pod的日志目录,最后调用RunPodSandbox来完成具体的pod创建工作。 RunPodSandbox方法位于pkg/kubelet/dockershim/docker_sandbox.go#L79, 内容如下:
1// RunPodSandbox creates and starts a pod-level sandbox. Runtimes should ensure
2// the sandbox is in ready state.
3// For docker, PodSandbox is implemented by a container holding the network
4// namespace for the pod.
5// Note: docker doesn't use LogDirectory (yet).
6func (ds *dockerService) RunPodSandbox(ctx context.Context, r *runtimeapi.RunPodSandboxRequest) (*runtimeapi.RunPodSandboxResponse, error) {
7 config := r.GetConfig()
8
9 // Step 1: Pull the image for the sandbox.
10 image := defaultSandboxImage
11 podSandboxImage := ds.podSandboxImage
12 if len(podSandboxImage) != 0 {
13 image = podSandboxImage
14 }
15
16 // NOTE: To use a custom sandbox image in a private repository, users need to configure the nodes with credentials properly.
17 // see: http://kubernetes.io/docs/user-guide/images/#configuring-nodes-to-authenticate-to-a-private-repository
18 // Only pull sandbox image when it's not present - v1.PullIfNotPresent.
19 if err := ensureSandboxImageExists(ds.client, image); err != nil {
20 return nil, err
21 }
22
23 // Step 2: Create the sandbox container.
24 createConfig, err := ds.makeSandboxDockerConfig(config, image)
25 if err != nil {
26 return nil, fmt.Errorf("failed to make sandbox docker config for pod %q: %v", config.Metadata.Name, err)
27 }
28 createResp, err := ds.client.CreateContainer(*createConfig)
29 if err != nil {
30 createResp, err = recoverFromCreationConflictIfNeeded(ds.client, *createConfig, err)
31 }
32
33 if err != nil || createResp == nil {
34 return nil, fmt.Errorf("failed to create a sandbox for pod %q: %v", config.Metadata.Name, err)
35 }
36 resp := &runtimeapi.RunPodSandboxResponse{PodSandboxId: createResp.ID}
37
38 ds.setNetworkReady(createResp.ID, false)
39 defer func(e *error) {
40 // Set networking ready depending on the error return of
41 // the parent function
42 if *e == nil {
43 ds.setNetworkReady(createResp.ID, true)
44 }
45 }(&err)
46
47 // Step 3: Create Sandbox Checkpoint.
48 if err = ds.checkpointHandler.CreateCheckpoint(createResp.ID, constructPodSandboxCheckpoint(config)); err != nil {
49 return nil, err
50 }
51
52 // Step 4: Start the sandbox container.
53 // Assume kubelet's garbage collector would remove the sandbox later, if
54 // startContainer failed.
55 err = ds.client.StartContainer(createResp.ID)
56 if err != nil {
57 return nil, fmt.Errorf("failed to start sandbox container for pod %q: %v", config.Metadata.Name, err)
58 }
59
60 // Rewrite resolv.conf file generated by docker.
61 // NOTE: cluster dns settings aren't passed anymore to docker api in all cases,
62 // not only for pods with host network: the resolver conf will be overwritten
63 // after sandbox creation to override docker's behaviour. This resolv.conf
64 // file is shared by all containers of the same pod, and needs to be modified
65 // only once per pod.
66 if dnsConfig := config.GetDnsConfig(); dnsConfig != nil {
67 containerInfo, err := ds.client.InspectContainer(createResp.ID)
68 if err != nil {
69 return nil, fmt.Errorf("failed to inspect sandbox container for pod %q: %v", config.Metadata.Name, err)
70 }
71
72 if err := rewriteResolvFile(containerInfo.ResolvConfPath, dnsConfig.Servers, dnsConfig.Searches, dnsConfig.Options); err != nil {
73 return nil, fmt.Errorf("rewrite resolv.conf failed for pod %q: %v", config.Metadata.Name, err)
74 }
75 }
76
77 // Do not invoke network plugins if in hostNetwork mode.
78 if config.GetLinux().GetSecurityContext().GetNamespaceOptions().GetNetwork() == runtimeapi.NamespaceMode_NODE {
79 return resp, nil
80 }
81
82 // Step 5: Setup networking for the sandbox.
83 // All pod networking is setup by a CNI plugin discovered at startup time.
84 // This plugin assigns the pod ip, sets up routes inside the sandbox,
85 // creates interfaces etc. In theory, its jurisdiction ends with pod
86 // sandbox networking, but it might insert iptables rules or open ports
87 // on the host as well, to satisfy parts of the pod spec that aren't
88 // recognized by the CNI standard yet.
89 cID := kubecontainer.BuildContainerID(runtimeName, createResp.ID)
90 err = ds.network.SetUpPod(config.GetMetadata().Namespace, config.GetMetadata().Name, cID, config.Annotations)
91 if err != nil {
92 // TODO(random-liu): Do we need to teardown network here?
93 if err := ds.client.StopContainer(createResp.ID, defaultSandboxGracePeriod); err != nil {
94 glog.Warningf("Failed to stop sandbox container %q for pod %q: %v", createResp.ID, config.Metadata.Name, err)
95 }
96 }
97 return resp, err
98}
在上面代码的第19行,首先通过调用ensureSandboxImageExists方法来拉取pod infra容器的镜像,确保在infra容器创建时镜像已经在本地。该方法的定义位于pkg/kubelet/dockershim/helpers.go#L316,内容如下:
1func ensureSandboxImageExists(client libdocker.Interface, image string) error {
2 _, err := client.InspectImageByRef(image)
3 if err == nil {
4 return nil
5 }
6 if !libdocker.IsImageNotFoundError(err) {
7 return fmt.Errorf("failed to inspect sandbox image %q: %v", image, err)
8 }
9
10 repoToPull, _, _, err := parsers.ParseImageName(image)
11 if err != nil {
12 return err
13 }
14
15 keyring := credentialprovider.NewDockerKeyring()
16 creds, withCredentials := keyring.Lookup(repoToPull)
17 if !withCredentials {
18 glog.V(3).Infof("Pulling image %q without credentials", image)
19
20 err := client.PullImage(image, dockertypes.AuthConfig{}, dockertypes.ImagePullOptions{})
21 if err != nil {
22 return fmt.Errorf("failed pulling image %q: %v", image, err)
23 }
24
25 return nil
26 }
27
28 var pullErrs []error
29 for _, currentCreds := range creds {
30 authConfig := credentialprovider.LazyProvide(currentCreds)
31 err := client.PullImage(image, authConfig, dockertypes.ImagePullOptions{})
32 // If there was no error, return success
33 if err == nil {
34 return nil
35 }
36
37 pullErrs = append(pullErrs, err)
38 }
39
40 return utilerrors.NewAggregate(pullErrs)
41}
该方法会首先判断镜像在不在本地,如果已经存在于本地则直接返回,如果不存在则调用docker client拉取镜像,拉取镜像时还会处理认证相关的问题。
在拉取镜像成功后,在第20行调用了CreateContainer来创建infra容器,该方法的定义位于pkg/kubelet/dockershim/libdocker/kube_docker_client.go#L141,内容如下:
1func (d *kubeDockerClient) CreateContainer(opts dockertypes.ContainerCreateConfig) (*dockercontainer.ContainerCreateCreatedBody, error) {
2 ctx, cancel := d.getTimeoutContext()
3 defer cancel()
4 // we provide an explicit default shm size as to not depend on docker daemon.
5 // TODO: evaluate exposing this as a knob in the API
6 if opts.HostConfig != nil && opts.HostConfig.ShmSize <= 0 {
7 opts.HostConfig.ShmSize = defaultShmSize
8 }
9 createResp, err := d.client.ContainerCreate(ctx, opts.Config, opts.HostConfig, opts.NetworkingConfig, opts.Name)
10 if ctxErr := contextError(ctx); ctxErr != nil {
11 return nil, ctxErr
12 }
13 if err != nil {
14 return nil, err
15 }
16 return &createResp, nil
17}
该方法在第9行实际上是调用docker client来创建容器,最终也就是调用docker的remote api来创建的容器。
在创建infra容器成功之后,在代码的第55行通过调用StartContainer来启动上一步中创建成功的容器。StartContainer的定义位于 pkg/kubelet/dockershim/libdocker/kube_docker_client.go#L159,内容如下:
1func (d *kubeDockerClient) StartContainer(id string) error {
2 ctx, cancel := d.getTimeoutContext()
3 defer cancel()
4 err := d.client.ContainerStart(ctx, id, dockertypes.ContainerStartOptions{})
5 if ctxErr := contextError(ctx); ctxErr != nil {
6 return ctxErr
7 }
8 return err
9}
从上面的代码第4行可以看出,跟CreateContainer类似,这一步也是通过调用docker的api接口来完成。
至此,pod创建的工作完成,从上面的分析可以看出kubelet最终是通过调用docker的接口来完成pod的创建。
这里需要说明一点,kubelet在创建pod时是先创建一个infra容器,配置好该容器的网络,然后创建真正工作的业务容器,最后再把业务容器的网络加到infra容器的网络命名空间中,相当于业务容器共享infra容器的网络命名空间。业务容器和infra容器共同组成一个pod。
本节的分析到此结束,下一节我们将分析kubelet是如何通过CNI插件来为pod配置网络。