前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Understanding Kubernetes Kube-Proxy

Understanding Kubernetes Kube-Proxy

作者头像
heidsoft
发布2019-09-10 14:09:27
2K0
发布2019-09-10 14:09:27
举报

Kubernetes is a complicated system with multiple components interacting with each other in complex ways. As you may already know, Kubernetes is made of master and node components.

Master components such as kube-scheduler, kube-controller-manager, etcd, and kube-apiserver are part of the Kubernetes Control Plane that runs on K8s master/s. The Plane is responsible for managing the cluster lifecycle, K8s API access, data persistence (etcd) and the maintenance of the desired cluster state.

In their turn, node components such as kubelet, container runtime (e.g., Docker), and kube-proxy run on the nodes and are responsible for managing containerized workloads (kubelet) and Services and for enabling communication between Pods (kube-proxy).

Kube-proxy is one of the most important node components that participates in managing Pod-to-Service and External-to-Service networking. Kubernetes has great documentation about Services that mentions kube-proxy and its modes. However, we would like to discuss this component in depth using practical examples. This will help you understand how Kubernetes Services work under the hood and how kube-proxy manages them by interacting with the networking frameworks inside the Linux kernel. Let’s get started!

So What Are Proxy and Kube-proxy?

A proxy server is any server/host that works as an intermediary between clients requesting resources from some servers and these servers. There are three basic types of proxy servers: (a) tunneling proxies; (b) forward proxies; and (c) reverse proxies.

A tunneling proxy passes unmodified requests from clients to servers on some network. It works as a gateway that enables packets from one network access servers on another network.

A forward proxy is an Internet-facing proxy that mediates client connections to web resources/servers on the Internet. It manages outgoing connections and can service a wide range of resource types.

Finally, a reverse proxy is an internal-facing proxy. It may be thought of as a frontend that controls access to servers on a private network. A reverse proxy takes incoming requests and redirects them to some internal server without the client knowing which one he/she is accessing. This is often done to protect a private network against direct exposure to external users. Reverse proxies can also perform load balancing, authentication, as well as caching and/or decryption.

Kube-proxy

Kube-proxy is the closest to the reverse proxy model in its concept and design (at least in the userspace mode as we’ll see later). As a reverse proxy, kube-proxy is responsible for watching client requests to some IP:port and forwarding/proxying them to the corresponding service/application on the private network. However, the difference between the kube-proxy and a normal reverse proxy is that the kube-proxy proxies requests to Kubernetes Services and their backend Pods, not hosts. There are some other important differences that we will discuss.

So, as we just noted, the kube-proxy proxies client requests to backend Pods managed by a Service. Its main task is to translate Virtual IPs of Services into IPs of backend Pods controlled by Services. This way, the clients accessing the Service do not need to know which Pods are available for that Service.

Kube-proxy can also work as a load balancer for the Service’s Pods. It can do simple TCP, UDP, and SCTP stream forwarding or round-robin TCP, UDP, and SCTP forwarding across a set of backends.

How Does Kube-proxy Handle NAT?

Network Address Translation (NAT) helps forward packets between different networks. More specifically, it allows packets originating from one network find destinations on the other network. In Kubernetes, we need some sort of NAT to translate Virtual IPs/Cluster IPs of Services into IPs of backend Pods.

However, by default, kube-proxy does not know how to implement this kind of network packet forwarding. Moreover, it needs to account for the fact that Service endpoints, i.e., Pods, are constantly changing. Thus, kube-proxy needs to know the state of the Service network at each point of time to ensure that packets arrive at the right Pods. We will discuss how kube-proxy solves these two challenges in what follows.

Translating Service VIPs into Real IPs

When a new Service of the type “ClusterIP” is created, the system assigns a virtual IP to it. This IP is virtual because there is no network interface or MAC address associated with it. Thus, the network as a whole does not know how to route packets going to this VIP.

How then does kube-proxy know how to route traffic from this virtual IP to the correct Pod? On the Linux systems where Kubernetes runs, kube-proxy closely interacts with the Linux kernel network configuration tools called netfilter and iptables to configure packet routing rules for this VIP. Let’s see how these kernel tools work and how kube-proxy interacts with them.

Netfilter and iptables

Netfilter is a set of Linux kernel hooks that allow various kernel modules register callback functions intercepting network packets and changing their destination/routing. A registered callback function can be thought of as a set of rules tested against every packet passing the network. So the netfilter’s role is to provide the interface for the software working with network rules to match packets against these rules. When a packet matching a rule is found, netfilter takes the specified action (e.g., redirects the packet). In general, netfilter and other components of the Linux networking framework enable packet filtering, network address and port translation (NAPT), and other packet mangling.

To set network routing rules in the netfilter, kube-proxy uses the userspace program called iptables. This program can inspect, forward, modify, redirect, and /or drop IP packets. Iptables consists of five tables: raw , filter , nat , mangle and security that configure packets at various stages of their network travel. In its turn, each table has a set of chains – lists of rules followed in order. For example, the filter table consists of INPUT, OUTPUT, and FORWARD chains. When the packet gets to the table, it is first processed by the INPUT chain.

Each chain includes individual rules that consist of condition(s) and corresponding action(s) to take when the condition is met. Here is the example of setting an iptables rule that blocks connection from a specific IP address in the INPUT chain of the filter table: 15.15.15.51 .

sudo iptables -A INPUT -s 15.15.15.51 -j DROP

Here, INPUT is the chain of the table where the target (the IP address) is filtered and corresponding action (dropping the packet) is taken.

Note: This is a very simplified picture of how iptables work though. If you want to learn more about iptables, check this excellent article from the Arch Linux wiki.

So, we have established that kube-proxy configures the netfilter Linux kernel feature via its user interface – iptables.

However, configuring routing rules is not enough.

IP addresses churn frequently in the containerized environment like Kubernetes. Therefore, kube-proxy has to watch for the Kubernetes API changes such as creating or updating the Service, adding or removing backend Pods IPs and changing iptables rules accordingly so that the routing from the virtual IP always goes to the correct Pod. The details of the process of translating VIPs to real Pods IPs differs depending on the kube-proxy mode selected. Let’s discuss these modes now.

Kube-proxy modes

Kube-proxy can work in three different modes:

  • userspace
  • iptables
  • and IPVS.

Why do we need all these modes? Well, these modes differ in how kube-proxy proxy interacts with the Linux userspace and kernelspace and what roles these spaces play in packet routing and load balancing of traffic to Service’s backends. To make the discussion clear, you should understand the difference between userspace and kernelspace.

Userspace vs. Kernelspace

In Linux, system memory can be divided into two distinct areas: kernel space and user space.

The core of the Operating System known as kernel executes its commands and provides OS services in the kernelspace. All user software and processes installed by users run in the userspace. When they need CPU time for computations, disk for I/O operations or fork the process, they send system calls to the kernel asking for its services.

In general, kernelspace modules and processes are much faster than userspace processes because they interact with the system’s hardware directly. Because the userspace programs should access the kernel services, they are much slower.

Now, that you understand the implications of userspace vs. kernelspace, we will discuss all kube-proxy modes.

Userspace Proxy Mode

In the userspace mode, most networking tasks, including setting packet rules and load balancing, are directly performed by the kube-proxy operating in the userspace. In this mode, kube-proxy comes the closest to the role of a reverse proxy that involves listening to traffic, routing traffic, and load balancing between traffic destinations. Also, in the userspace mode, kube-proxy must frequently switch context between userspace and kernelspace when it interacts with iptables and does load balancing.

Proxying traffic between the VIPs and backend Pods in the userspace mode is done in four steps:

  • kube-proxy watches for the creation/deletion of Services and their Endpoints (backend Pods).
  • When a new Service of a type ClusterIP is created, kube-proxy opens a random port on the node. The aim is to proxy any connection to this port to one of the Service’s backend Endpoints. The choice of the backend Pod is based on the SessionAffinity of the Service.
  • kube-proxy installs iptables rules that intercept traffic to the Service’s VIP and Service Port and redirect that traffic to the host port opened in the step above.
  • When the redirected traffic gets to the node’s port, kube-proxy works as a load balancer distributing traffic across the backend Pods. The choice of the backend Pod is round robin by default.

As you see, in this mode kube-proxy works as a userspace proxy that opens a proxy port, listens on it, and redirects packets from the port to the backend Pods.

This approach involves much context-switching, however. Kube-proxy has to switch to the kernelspace when VIPs are redirected to the proxy port and then back to the userspace to load balance between the set of backend Pods. This is because it does not install iptables rules for load balancing between Service endpoints/backends. Thus, load balancing is done directly by the kube-proxy in the userspace. As a result of frequent context-switching, userspace mode is not as fast and scalable as other two modes we are about to describe.

Example #1: Userspacemode

Let’s illustrate how the userspace mode works using an example in the image above. Here, kube-proxy opens a random port (10400) on the node’s eth0 interface after the Service with the ClusterIP 10.104.141.67 is created.

Then, kube-proxy creates netfilter rules that reroute packets sent to the service VIP to the proxy port. After the packets get to this port, kube-proxy selects one of the backend Pods (e.g Pod 1) and forwards traffic to it. As you can imagine, a number of intermediary steps are involved in this process.

Iptables Mode

Iptables is the default kube-proxy mode since Kubernetes v1.2 and allows for faster packet resolution between Services and backend Pods than the userspace mode.

In the iptables mode, kube-proxy no longer works as a reverse proxy load balancing the traffic between backend Pods. This task is delegated to iptables/netfilter. Iptables is tightly integrated with netfilter, so there is no need to frequently switch context between the userspace and the kernelspace. Also, load balancing between backend Pods is done directly via the iptables rules.

This is how the entire process looks like (see the image below):

  • As in the userspace mode, kube-proxy watches for the creation/deletion of Services and their Endpoints objects.
  • However, instead of opening a random port on the host when a new Service is created/updated, kube-proxy immediately installs iptables rules that capture traffic to the Service’s ClusterIP and Port and redirect it to one of the Service’s backend sets.
  • Also, kube-proxy installs iptables rules for each Endpoint object. These rules are used by iptables to select a backend Pod. By default, the choice of backend Pod is random.

However, kube-proxy retains its role of keeping netfilter rules in sync. It constantly watches for Service and Endpoints updates and changes iptables rules accordingly.

Iptables mode is great, but it has one tangible limitation. Remember that in the userspace mode kube-proxy directly load balances between Pods? It can select another Pod if the one it’s trying to access does not respond. Iptables rules, however, don’t have the mechanism to automatically retry another Pod if the one it initially selects does not respond. Therefore, this mode depends on having working readiness probes.

Example #2: Check iptables rules created by kube-proxy for a Service

In this example, we demonstrate how to access iptables rules created by kube-proxy for the HTTPD Service. This example was tested on Kubernetes 1.13.0 running on Minikube 0.33.1.

First, let’s create a HTTPD Deployment:

Thus, in the iptables mode, kube-proxy fully delegates the task of redirectin

g traffic and load balancing between the backend Pods to netfilter/iptables. All these tasks happen in the kernelspace, which makes the process much more faster than in the userspace mode.

kubectl run httpd-deployment --image=httpd --replicas=2

Next, expose it via Service:

kubectl expose deployment httpd-deployment --port=80

We need to know the ClusterIP of the Service to identify it later. It is 10.104.141.67 as the output below suggests:

kubectl describe svc httpd-deployment Name: httpd-deployment Namespace: default Labels: run=httpd-deployment Annotations: <none> Selector: run=httpd-deployment Type: ClusterIP IP: 10.104.141.67 Port: <unset> 80/TCP TargetPort: 80/TCP Endpoints: 172.17.0.5:80,172.17.0.6:80 Session Affinity: None Events: <none>

Iptables rules are installed by the kube-proxy Pod so we’ll need to get its name first.

kubectl get pods --namespace kube-system

NAME READY STATUS RESTARTS AGE

kube-proxy-pz9l9 1/1 Running 0 4m12s

Finally, get a shell to the running kube-proxy Pod:

kubectl exec -ti kube-proxy-pz9l9 --namespace kube-system -- /bin/sh

We can now access the iptables inside the kube-proxy. For example, you can list all rules in the nat table like this:

2

iptables --table nat --list

This chain includes a list of rules for your K8s Services:

123456789

Chain KUBE-SERVICES (2 references)target prot opt source destination KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domainKUBE-SVC-LC5QY66VUV2HJ6WZ tcp -- anywhere 10.99.201.218 /* kube-system/metrics-server: cluster IP */ tcp dpt:httpsKUBE-SVC-KO6WMUDK3F2YFERC tcp -- anywhere 10.104.141.67 /* default/httpd-deployment: cluster IP */ tcp dpt:httpKUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:httpsKUBE-SVC-TCOU7JCQXEZGVUNU udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:domainKUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

As the third rule suggests, the traffic to our Service with the ClusterIP 10.104.141.67 is forwarded to #default/httpd-deployment (the Service’s backend Pods) via TCP dpt:http forwarding. This forwarding is performed directly by iptables using random Pod selection.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-09-08,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 云数智圈 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • So What Are Proxy and Kube-proxy?
  • Kube-proxy
  • How Does Kube-proxy Handle NAT?
  • Translating Service VIPs into Real IPs
  • Netfilter and iptables
  • Kube-proxy modes
  • Userspace vs. Kernelspace
  • Userspace Proxy Mode
    • Example #1: Userspacemode
    • Iptables Mode
    • Example #2: Check iptables rules created by kube-proxy for a Service
    相关产品与服务
    容器服务
    腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档