class: title, self-paced Building a Kubernetes Cluster
one piece at a time
.nav[*Self-paced version*] .debug[ ``` ``` These slides have been built from commit: 0343010 [shared/title.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/shared/title.md)] --- class: title, in-person Building a Kubernetes Cluster
one piece at a time
.footnote[ **Slides[:](https://www.youtube.com/watch?v=h16zyxiwDLY) https://2024-10-pick.container.training/** ] .debug[[shared/title.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/shared/title.md)] --- ## Introductions - Hello! I'm Jérôme Petazzoni ([@jpetazzo@hachyderm.io], Tiny Shell Script LLC) - Feel free to interrupt for questions at any time [@alexbuisine]: https://twitter.com/alexbuisine [EphemeraSearch]: https://ephemerasearch.com/ [@jpetazzo]: https://twitter.com/jpetazzo [@jpetazzo@hachyderm.io]: https://hachyderm.io/@jpetazzo [@s0ulshake]: https://twitter.com/s0ulshake [Quantgene]: https://www.quantgene.com/ .debug[[logistics.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/logistics.md)] --- ## Goals today Prepare for the CKA exam! -- *How?* -- Acquire deep understanding of Kubernetes internals! -- *How?* -- Build a Kubernetes cluster by hand! .debug[[logistics.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/logistics.md)] --- ## History of this talk - [2018 CKA preparation](https://github.com/jpetazzo/dessine-moi-un-cluster) - [2019 LISA talk by Jérôme Petazzoni](https://www.youtube.com/watch?v=3KtEAa7_duA) - Kubernetes admin/ops training classes - [2023 Devoxx talk by Denis Germain](https://www.youtube.com/watch?v=OCMNA0dSAzc) - More Kubernetes admin/ops training classes .debug[[logistics.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/logistics.md)] --- name: toc-part-1 ## Table of contents - [kubectl create deployment](#toc-kubectl-create-deployment) - [Building a 1-node cluster](#toc-building-a--node-cluster) - [Adding nodes to the cluster](#toc-adding-nodes-to-the-cluster) - [CNI internals](#toc-cni-internals) - [API server availability](#toc-api-server-availability) - [Securing the control plane](#toc-securing-the-control-plane) .debug[(auto-generated TOC)] .debug[[shared/toc.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/shared/toc.md)] --- class: pic .interstitial[] --- name: toc-kubectl-create-deployment class: title kubectl create deployment .nav[ [Previous part](#toc-) | [Back to table of contents](#toc-part-1) | [Next part](#toc-building-a--node-cluster) ] .debug[(automatically generated title slide)] --- # kubectl create deployment ... in 19,000 words! They say, "a picture is worth one thousand words." The following 19 slides show what really happens when we run: ```bash kubectl create deployment web --image=nginx ``` .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic  .debug[[k8s/deploymentslideshow.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/deploymentslideshow.md)] --- class: pic .interstitial[] --- name: toc-building-a--node-cluster class: title Building a 1-node cluster .nav[ [Previous part](#toc-kubectl-create-deployment) | [Back to table of contents](#toc-part-1) | [Next part](#toc-adding-nodes-to-the-cluster) ] .debug[(automatically generated title slide)] --- # Building a 1-node cluster - Ingredients: a Linux machine with... - Ubuntu LTS - Kubernetes, etcd, and CNI binaries installed - nothing is running .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## The plan 1. Start API server 2. Interact with it (create Deployment and Service) 3. See what's broken 4. Fix it and go back to step 2 until it works! .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Starting API server .lab[ - Try to start the API server: ```bash kube-apiserver # It will complain about permission to /var/run/kubernetes sudo kube-apiserver # Now it will complain about a bunch of missing flags, including: # --etcd-servers # --service-account-issuer # --service-account-signing-key-file ``` ] We'll need to start etcd. But we'll also need some TLS keys! .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Generating TLS keys - There are many ways to generate TLS keys (and certificates) - A very popular and modern tool to do that is [cfssl] - We're going to use the old-fashioned [openssl] CLI - Feel free to use cfssl or any other tool if you prefer! [cfssl]: https://github.com/cloudflare/cfssl#using-the-command-line-tool [openssl]: https://www.openssl.org/docs/man3.0/man1/ .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## How many keys do we need? At the very least, we need the following two keys: - ServiceAccount key pair - API client key pair, aka "CA key" (technically, we will need a *certificate* for that key pair) But if we wanted to tighten the cluster security, we'd need many more... .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## The other keys These keys are not strictly necessary at this point: - etcd key pair *without that key, communication with etcd will be insecure* - API server endpoint key pair *the API server will generate this one automatically if we don't* - kubelet key pair (used by API server to connect to kubelets) *without that key, commands like kubectl logs/exec will be insecure* .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Would you like some auth with that? If we want to enable authentication and authorization, we also need various API client key pairs signed by the "CA key" mentioned earlier. That would include (non-exhaustive list): - controller manager key pair - scheduler key pair - in most cases: kube-proxy (or equivalent) key pair - in most cases: key pairs for the nodes joining the cluster (these might be generated through TLS bootstrap tokens) - key pairs for users that will interact with the clusters (unless another authentication mechanism like OIDC is used) .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Generating our keys and certificates .lab[ - Generate the ServiceAccount key pair: ```bash openssl genrsa -out sa.key 2048 ``` - Generate the CA key pair: ```bash openssl genrsa -out ca.key 2048 ``` - Generate a self-signed certificate for the CA key: ```bash openssl x509 -new -key ca.key -out ca.cert -subj /CN=kubernetes/ ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Starting etcd - This one is easy! .lab[ - Start etcd: ```bash etcd ``` ] Note: if you want a bit of extra challenge, you can try to generate the etcd key pair and use it. (You will need to pass it to etcd and to the API server.) .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Starting API server - We need to use the keys and certificate that we just generated .lab[ - Start the API server: ```bash sudo kube-apiserver \ --etcd-servers=http://localhost:2379 \ --service-account-signing-key-file=sa.key \ --service-account-issuer=https://kubernetes \ --service-account-key-file=sa.key \ --client-ca-file=ca.cert ``` ] The API server should now start. But can we really use it? 🤔 .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Trying `kubectl` - Let's try some simple `kubectl` command .lab[ - Try to list Namespaces: ```bash kubectl get namespaces ``` ] We're getting an error message like this one: ``` The connection to the server localhost:8080 was refused - did you specify the right host or port? ``` .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## What's going on? - Recent versions of Kubernetes don't support unauthenticated API access - The API server doesn't support listening on plain HTTP anymore - `kubectl` still tries to connect to `localhost:8080` by default - But there is nothing listening there - Our API server listens on port 6443, using TLS .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Trying to access the API server - Let's use `curl` first to confirm that everything works correctly (and then we will move to `kubectl`) .lab[ - Try to connect with `curl`: ```bash curl https://localhost:6443 # This will fail because the API server certificate is unknown. ``` - Try again, skipping certificate verification: ```bash curl --insecure https://localhost:6443 ``` ] We should now see an `Unauthorized` Kubernetes API error message. We need to authenticate with our key and certificate. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Authenticating with the API server - For the time being, we can use the CA key and cert directly - In a real world scenario, we would *never* do that! (because we don't want the CA key to be out there in the wild) .lab[ - Try again, skipping cert verification, and using the CA key and cert: ```bash curl --insecure --key ca.key --cert ca.cert https://localhost:6443 ``` ] We should see a list of API routes. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- class: extra-details ## Doing it right In the future, instead of using the CA key and certificate, we should generate a new key, and a certificate for that key, signed by the CA key. Then we can use that new key and certificate to authenticate. Example: ``` ### Generate a key pair openssl genrsa -out user.key ### Extract the public key openssl pkey -in user.key -out user.pub -pubout ### Generate a certificate signed by the CA key openssl x509 -new -key ca.key -force_pubkey user.pub -out user.cert \ -subj /CN=kubernetes-user/ ``` .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Writing a kubeconfig file - We now want to use `kubectl` instead of `curl` - We'll need to write a kubeconfig file for `kubectl` - There are many way to do that; here, we're going to use `kubectl config` - We'll need to: - set the "cluster" (API server endpoint) - set the "credentials" (the key and certficate) - set the "context" (referencing the cluster and credentials) - use that context (make it the default that `kubectl` will use) .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Set the cluster The "cluster" section holds the API server endpoint. .lab[ - Set the API server endpoint: ```bash kubectl config set-cluster polykube --server=https://localhost:6443 ``` - Don't verify the API server certificate: ```bash kubectl config set-cluster polykube --insecure-skip-tls-verify ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Set the credentials The "credentials" section can hold a TLS key and certificate, or a token, or configuration information for a plugin (for instance, when using AWS EKS or GCP GKE, they use a plugin). .lab[ - Set the client key and certificate: ```bash kubectl config set-credentials polykube \ --client-key ca.key \ --client-certificate ca.cert ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Set and use the context The "context" section references the "cluster" and "credentials" that we defined earlier. (It can also optionally reference a Namespace.) .lab[ - Set the "context": ```bash kubectl config set-context polykube --cluster polykube --user polykube ``` - Set that context to be the default context: ```bash kubectl config use-context polykube ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Review the kubeconfig file The kubeconfig file should look like this: .small[ ```yaml apiVersion: v1 clusters: - cluster: insecure-skip-tls-verify: true server: https://localhost:6443 name: polykube contexts: - context: cluster: polykube user: polykube name: polykube current-context: polykube kind: Config preferences: {} users: - name: polykube user: client-certificate: /root/ca.cert client-key: /root/ca.key ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Trying the kubeconfig file - We should now be able to access our cluster's API! .lab[ - Try to list Namespaces: ```bash kubectl get namespaces ``` ] This should show the classic `default`, `kube-system`, etc. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- class: extra-details ## Do we need `--client-ca-file` ? Technically, we didn't need to specify the `--client-ca-file` flag! But without that flag, no client can be authenticated. Which means that we wouldn't be able to issue any API request! .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Running pods - We can now try to create a Deployment .lab[ - Create a Deployment: ```bash kubectl create deployment blue --image=jpetazzo/color ``` - Check the results: ```bash kubectl get deployments,replicasets,pods ``` ] Our Deployment exists, but not the Replica Set or Pod. We need to run the controller manager. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Running the controller manager - Previously, we used the `--master` flag to pass the API server address - Now, we need to authenticate properly - The simplest way at this point is probably to use the same kubeconfig file! .lab[ - Start the controller manager: ```bash kube-controller-manager --kubeconfig .kube/config ``` - Check the results: ```bash kubectl get deployments,replicasets,pods ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## What's next? - Normally, the last commands showed us a Pod in `Pending` state - We need two things to continue: - the scheduler (to assign the Pod to a Node) - a Node! - We're going to run `kubelet` to register the Node with the cluster .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Running `kubelet` - Let's try to run `kubelet` and see what happens! .lab[ - Start `kubelet`: ```bash sudo kubelet ``` ] We should see an error about connecting to `containerd.sock`. We need to run a container engine! (For instance, `containerd`.) .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Running `containerd` - We need to install and start `containerd` - You could try another engine if you wanted (but there might be complications!) .lab[ - Install `containerd`: ```bash sudo apt-get install containerd ``` - Start `containerd`: ```bash sudo containerd ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- class: extra-details ## Configuring `containerd` Depending on how we install `containerd`, it might need a bit of extra configuration. Watch for the following symptoms: - `containerd` refuses to start (rare, unless there is an *invalid* configuration) - `containerd` starts but `kubelet` can't connect (could be the case if the configuration disables the CRI socket) - `containerd` starts and things work but Pods keep being killed (may happen if there is a mismatch in the cgroups driver) .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Starting `kubelet` for good - Now that `containerd` is running, `kubelet` should start! .lab[ - Try to start `kubelet`: ```bash sudo kubelet ``` - In another terminal, check if our Node is now visible: ```bash sudo kubectl get nodes ``` ] `kubelet` should now start, but our Node doesn't show up in `kubectl get nodes`! This is because without a kubeconfig file, `kubelet` runs in standalone mode:
it will not connect to a Kubernetes API server, and will only start *static pods*. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Passing the kubeconfig file - Let's start `kubelet` again, with our kubeconfig file .lab[ - Stop `kubelet` (e.g. with `Ctrl-C`) - Restart it with the kubeconfig file: ```bash sudo kubelet --kubeconfig .kube/config ``` - Check our list of Nodes: ```bash kubectl get nodes ``` ] This time, our Node should show up! .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Node readiness - However, our Node shows up as `NotReady` - If we wait a few minutes, the `kubelet` logs will tell us why: *we're missing a CNI configuration!* - As a result, the containers can't be connected to the network - `kubelet` detects that and doesn't become `Ready` until this is fixed .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## CNI configuration - We need to provide a CNI configuration - This is a file in `/etc/cni/net.d` (the name of the file doesn't matter; the first file in lexicographic order will be used) - Usually, when installing a "CNI plugin¹", this file gets installed automatically - Here, we are going to write that file manually .footnote[¹Technically, a *pod network*; typically running as a DaemonSet, which will install the file with a `hostPath` volume.] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Our CNI configuration Create the following file in e.g. `/etc/cni/net.d/kube.conf`: ```json { "cniVersion": "0.3.1", "name": "kube", "type": "bridge", "bridge": "cni0", "isDefaultGateway": true, "ipMasq": true, "hairpinMode": true, "ipam": { "type": "host-local", "subnet": "10.1.1.0/24" } } ``` That's all we need - `kubelet` will detect and validate the file automatically! .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Checking our Node again - After a short time (typically about 10 seconds) the Node should be `Ready` .lab[ - Wait until the Node is `Ready`: ```bash kubectl get nodes ``` ] If the Node doesn't show up as `Ready`, check the `kubelet` logs. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## What's next? - At this point, we have a `Pending` Pod and a `Ready` Node - All we need is the scheduler to bind the former to the latter .lab[ - Run the scheduler: ```bash kube-scheduler --kubeconfig .kube/config ``` - Check that the Pod gets assigned to the Node and becomes `Running`: ```bash kubectl get pods ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Check network access - Let's check that we can connect to our Pod, and that the Pod can connect outside .lab[ - Get the Pod's IP address: ```bash kubectl get pods -o wide ``` - Connect to the Pod (make sure to update the IP address): ```bash curl `10.1.1.2` ``` - Check that the Pod has external connectivity too: ```bash kubectl exec `blue-xxxxxxxxxx-yyyyy` -- ping -c3 1.1 ``` ] .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Expose our Deployment - We can now try to expose the Deployment and connect to the ClusterIP .lab[ - Expose the Deployment: ```bash kubectl expose deployment blue --port=80 ``` - Retrieve the ClusterIP: ```bash kubectl get services ``` - Try to connect to the ClusterIP: ```bash curl `10.0.0.42` ``` ] At this point, it won't work - we need to run `kube-proxy`! .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## Running `kube-proxy` - We need to run `kube-proxy` (also passing it our kubeconfig file) .lab[ - Run `kube-proxy`: ```bash sudo kube-proxy --kubeconfig .kube/config ``` - Try again to connect to the ClusterIP: ```bash curl `10.0.0.42` ``` ] This time, it should work. .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- ## What's next? - Scale up the Deployment, and check that load balancing works properly - Enable RBAC, and generate individual certificates for each controller (check the [certificate paths][certpath] section in the Kubernetes documentation for a detailed list of all the certificates and keys that are used by the control plane, and which flags are used by which components to configure them!) - Add more nodes to the cluster *Feel free to try these if you want to get additional hands-on experience!* [certpath]: https://kubernetes.io/docs/setup/best-practices/certificates/#certificate-paths ??? :EN:- Setting up control plane certificates :EN:- Implementing a basic CNI configuration :FR:- Mettre en place les certificats du plan de contrôle :FR:- Réaliser un configuration CNI basique .debug[[k8s/dmuc-medium.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-medium.md)] --- class: pic .interstitial[] --- name: toc-adding-nodes-to-the-cluster class: title Adding nodes to the cluster .nav[ [Previous part](#toc-building-a--node-cluster) | [Back to table of contents](#toc-part-1) | [Next part](#toc-cni-internals) ] .debug[(automatically generated title slide)] --- # Adding nodes to the cluster - In the previous section, we built a cluster with a single node - In this new section, we're going to add more nodes to the cluster - Note: we will need the lab environment of that previous section - If you haven't done it yet, you should go through that section first .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Our environment - On `polykube1`, we should have our Kubernetes control plane - We're also assuming that we have the kubeconfig file created earlier (in `~/.kube/config`) - We're going to work on `polykube2` and add it to the cluster - This machine has exactly the same setup as `polykube1` (Ubuntu LTS with CNI, etcd, and Kubernetes binaries installed) - Note that we won't need the etcd binaries here (the control plane will run solely on `polykube1`) .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Checklist We need to: - generate the kubeconfig file for `polykube2` - install a container engine - generate a CNI configuration file - start kubelet .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Generating the kubeconfig file - Ideally, we should generate a key pair and certificate for `polykube2`... - ...and generate a kubeconfig file using these - At the moment, for simplicity, we'll use the same key pair and certificate as earlier - We have a couple of options: - copy the required files (kubeconfig, key pair, certificate) - "flatten" the kubeconfig file (embed the key and certificate within) .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- class: extra-details ## To flatten or not to flatten? - "Flattening" the kubeconfig file can seem easier (because it means we'll only have one file to move around) - But it's easier to rotate the key or renew the certificate when they're in separate files .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Flatten and copy the kubeconfig file - We'll flatten the file and copy it over .lab[ - On `polykube1`, flatten the kubeconfig file: ```bash kubectl config view --flatten > kubeconfig ``` - Then copy it to `polykube2`: ```bash scp kubeconfig polykube2: ``` ] .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Generate CNI configuration Back on `polykube2`, put the following in `/etc/cni/net.d/kube.conf`: ```json { "cniVersion": "0.3.1", "name": "kube", "type": "bridge", "bridge": "cni0", "isDefaultGateway": true, "ipMasq": true, "hairpinMode": true, "ipam": { "type": "host-local", "subnet": `"10.1.2.0/24"` } } ``` Note how we changed the subnet! .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Install container engine and start `kubelet` .lab[ - Install `containerd`: ```bash sudo apt-get install containerd -y ``` - Start `containerd`: ```bash sudo systemctl start containerd ``` - Start `kubelet`: ```bash sudo kubelet --kubeconfig kubeconfig ``` ] We're getting errors looking like: ``` "Post \"https://localhost:6443/api/v1/nodes\": ... connect: connection refused" ``` .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Updating the kubeconfig file - Our kubeconfig file still references `localhost:6443` - This was fine on `polykube1` (where `kubelet` was connecting to the control plane running locally) - On `polykube2`, we need to change that and put the address of the API server (i.e. the address of `polykube1`) .lab[ - Update the `kubeconfig` file: ```bash sed -i s/localhost:6443/polykube1:6443/ kubeconfig ``` ] .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Starting `kubelet` - `kubelet` should now start correctly (hopefully!) .lab[ - On `polykube2`, start `kubelet`: ```bash sudo kubelet --kubeconfig kubeconfig ``` - On `polykube1`, check that `polykube2` shows up and is `Ready`: ```bash kubectl get nodes ``` ] .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Testing connectivity - From `polykube1`, can we connect to Pods running on `polykube2`? 🤔 .lab[ - Scale the test Deployment: ```bash kubectl scale deployment blue --replicas=5 ``` - Get the IP addresses of the Pods: ```bash kubectl get pods -o wide ``` - Pick a Pod on `polykube2` and try to connect to it: ```bash curl `10.1.2.2` ``` ] -- At that point, it doesn't work. .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Refresher on the *pod network* - The *pod network* (or *pod-to-pod network*) has a few responsibilities: - allocating and managing Pod IP addresses - connecting Pods and Nodes - connecting Pods together on a given node - *connecting Pods together across nodes* - That last part is the one that's not functioning in our cluster - It typically requires some combination of routing, tunneling, bridging... .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Connecting networks together - We can add manual routes between our nodes - This requires adding `N x (N-1)` routes (on each node, add a route to every other node) - This will work on home labs where nodes are directly connected (e.g. on an Ethernet switch, or same WiFi network, or a bridge between local VMs) - ...Or on clouds where IP address filtering has been disabled (by default, most cloud providers will discard packets going to unknown IP addresses) - If IP address filtering is enabled, you'll have to use e.g. tunneling or overlay networks .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Important warning - The technique that we are about to use doesn't work everywhere - It only works if: - all the nodes are directly connected to each other (at layer 2) - the underlying network allows the IP addresses of our pods - If we are on physical machines connected by a switch: OK - If we are on virtual machines in a public cloud: NOT OK - on AWS, we need to disable "source and destination checks" on our instances - on OpenStack, we need to disable "port security" on our network ports .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Routing basics - We need to tell *each* node: "The subnet 10.1.N.0/24 is located on node N" (for all values of N) - This is how we add a route on Linux: ```bash ip route add 10.1.N.0/24 via W.X.Y.Z ``` (where `W.X.Y.Z` is the internal IP address of node N) - We can see the internal IP addresses of our nodes with: ```bash kubectl get nodes -o wide ``` .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## Adding our route - Let's add a route from `polykube1` to `polykube2` .lab[ - Check the internal address of `polykube2`: ```bash kubectl get node polykube2 -o wide ``` - Now, on `polykube1`, add the route to the Pods running on `polykube2`: ```bash sudo ip route add 10.1.2.0/24 via `A.B.C.D` ``` - Finally, check that we can now connect to a Pod running on `polykube2`: ```bash curl 10.1.2.2 ``` ] .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- ## What's next? - The network configuration feels very manual: - we had to generate the CNI configuration file (in `/etc/cni/net.d`) - we had to manually update the nodes' routing tables - Can we automate that? **YES!** - We could install something like [kube-router](https://www.kube-router.io/) (which specifically takes care of the CNI configuration file and populates routing tables) - Or we could also go with e.g. [Cilium](https://cilium.io/) .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- class: extra-details ## If you want to try Cilium... - Add the `--root-ca-file` flag to the controller manager: - use the certificate automatically generated by the API server
(it should be in `/var/run/kubernetes/apiserver.crt`) - or generate a key pair and certificate for the API server and point to that certificate - without that, you'll get certificate validation errors
(because in our Pods, the `ca.crt` file used to validate the API server will be empty) - Check the Cilium [without kube-proxy][ciliumwithoutkubeproxy] instructions (make sure to pass the API server IP address and port!) - Other pod-to-pod network implementations might also require additional steps [ciliumwithoutkubeproxy]: https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#kubeproxy-free .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- class: extra-details ## About the API server certificate... - In the previous sections, we've skipped API server certificate verification - To generate a proper certificate, we need to include a `subjectAltName` extension - And make sure that the CA includes the extension in the certificate ```bash openssl genrsa -out apiserver.key 4096 openssl req -new -key apiserver.key -subj /CN=kubernetes/ \ -addext "subjectAltName = DNS:kubernetes.default.svc, \ DNS:kubernetes.default, DNS:kubernetes, \ DNS:localhost, DNS:polykube1" -out apiserver.csr openssl x509 -req -in apiserver.csr -CAkey ca.key -CA ca.cert \ -out apiserver.crt -copy_extensions copy ``` ??? :EN:- Connecting nodes and pods :FR:- Interconnecter les nœuds et les pods .debug[[k8s/dmuc-hard.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/dmuc-hard.md)] --- class: pic .interstitial[] --- name: toc-cni-internals class: title CNI internals .nav[ [Previous part](#toc-adding-nodes-to-the-cluster) | [Back to table of contents](#toc-part-1) | [Next part](#toc-api-server-availability) ] .debug[(automatically generated title slide)] --- # CNI internals - Kubelet looks for a CNI configuration file (by default, in `/etc/cni/net.d`) - Note: if we have multiple files, the first one will be used (in lexicographic order) - If no configuration can be found, kubelet holds off on creating containers (except if they are using `hostNetwork`) - Let's see how exactly plugins are invoked! .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## General principle - A plugin is an executable program - It is invoked with by kubelet to set up / tear down networking for a container - It doesn't take any command-line argument - However, it uses environment variables to know what to do, which container, etc. - It reads JSON on stdin, and writes back JSON on stdout - There will generally be multiple plugins invoked in a row (at least IPAM + network setup; possibly more) .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## Environment variables - `CNI_COMMAND`: `ADD`, `DEL`, `CHECK`, or `VERSION` - `CNI_CONTAINERID`: opaque identifier (container ID of the "sandbox", i.e. the container running the `pause` image) - `CNI_NETNS`: path to network namespace pseudo-file (e.g. `/var/run/netns/cni-0376f625-29b5-7a21-6c45-6a973b3224e5`) - `CNI_IFNAME`: interface name, usually `eth0` - `CNI_PATH`: path(s) with plugin executables (e.g. `/opt/cni/bin`) - `CNI_ARGS`: "extra arguments" (see next slide) .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## `CNI_ARGS` - Extra key/value pair arguments passed by "the user" - "The user", here, is "kubelet" (or in an abstract way, "Kubernetes") - This is used to pass the pod name and namespace to the CNI plugin - Example: ``` IgnoreUnknown=1 K8S_POD_NAMESPACE=default K8S_POD_NAME=web-96d5df5c8-jcn72 K8S_POD_INFRA_CONTAINER_ID=016493dbff152641d334d9828dab6136c1ff... ``` Note that technically, it's a `;`-separated list, so it really looks like this: ``` CNI_ARGS=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=web-96d... ``` .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## JSON in, JSON out - The plugin reads its configuration on stdin - It writes back results in JSON (e.g. allocated address, routes, DNS...) ⚠️ "Plugin configuration" is not always the same as "CNI configuration"! .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## Conf vs Conflist - The CNI configuration can be a single plugin configuration - it will then contain a `type` field in the top-most structure - it will be passed "as-is" - It can also be a "conflist", containing a chain of plugins (it will then contain a `plugins` field in the top-most structure) - Plugins are then invoked in order (reverse order for `DEL` action) - In that case, the input of the plugin is not the whole configuration (see details on next slide) .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## List of plugins - When invoking a plugin in a list, the JSON input will be: - the configuration of the plugin - augmented with `name` (matching the conf list `name`) - augmented with `prevResult` (which will be the output of the previous plugin) - Conceptually, a plugin (generally the first one) will do the "main setup" - Other plugins can do tuning / refinement (firewalling, traffic shaping...) .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## Analyzing plugins - Let's see what goes in and out of our CNI plugins! - We will create a fake plugin that: - saves its environment and input - executes the real plugin with the saved input - saves the plugin output - passes the saved output .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## Our fake plugin ```bash #!/bin/sh PLUGIN=$(basename $0) cat > /tmp/cni.$$.$PLUGIN.in env | sort > /tmp/cni.$$.$PLUGIN.env echo "PPID=$PPID, $(readlink /proc/$PPID/exe)" > /tmp/cni.$$.$PLUGIN.parent $0.real < /tmp/cni.$$.$PLUGIN.in > /tmp/cni.$$.$PLUGIN.out EXITSTATUS=$? cat /tmp/cni.$$.$PLUGIN.out exit $EXITSTATUS ``` Save this script as `/opt/cni/bin/debug` and make it executable. .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## Substituting the fake plugin - For each plugin that we want to instrument: - rename the plugin from e.g. `ptp` to `ptp.real` - symlink `ptp` to our `debug` plugin - There is no need to change the CNI configuration or restart kubelet .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- ## Create some pods and looks at the results - Create a pod - For each instrumented plugin, there will be files in `/tmp`: `cni.PID.pluginname.in` (JSON input) `cni.PID.pluginname.env` (environment variables) `cni.PID.pluginname.parent` (parent process information) `cni.PID.pluginname.out` (JSON output) ❓️ What is calling our plugins? ??? :EN:- Deep dive into CNI internals :FR:- La Container Network Interface (CNI) en détails .debug[[k8s/cni-internals.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/cni-internals.md)] --- class: pic .interstitial[] --- name: toc-api-server-availability class: title API server availability .nav[ [Previous part](#toc-cni-internals) | [Back to table of contents](#toc-part-1) | [Next part](#toc-securing-the-control-plane) ] .debug[(automatically generated title slide)] --- # API server availability - When we set up a node, we need the address of the API server: - for kubelet - for kube-proxy - sometimes for the pod network system (like kube-router) - How do we ensure the availability of that endpoint? (what if the node running the API server goes down?) .debug[[k8s/apilb.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/apilb.md)] --- ## Option 1: external load balancer - Set up an external load balancer - Point kubelet (and other components) to that load balancer - Put the node(s) running the API server behind that load balancer - Update the load balancer if/when an API server node needs to be replaced - On cloud infrastructures, some mechanisms provide automation for this (e.g. on AWS, an Elastic Load Balancer + Auto Scaling Group) - [Example in Kubernetes The Hard Way](https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/08-bootstrapping-kubernetes-controllers.md#the-kubernetes-frontend-load-balancer) .debug[[k8s/apilb.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/apilb.md)] --- ## Option 2: local load balancer - Set up a load balancer (like NGINX, HAProxy...) on *each* node - Configure that load balancer to send traffic to the API server node(s) - Point kubelet (and other components) to `localhost` - Update the load balancer configuration when API server nodes are updated .debug[[k8s/apilb.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/apilb.md)] --- ## Updating the local load balancer config - Distribute the updated configuration (push) - Or regularly check for updates (pull) - The latter requires an external, highly available store (it could be an object store, an HTTP server, or even DNS...) - Updates can be facilitated by a DaemonSet (but remember that it can't be used when installing a new node!) .debug[[k8s/apilb.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/apilb.md)] --- ## Option 3: DNS records - Put all the API server nodes behind a round-robin DNS - Point kubelet (and other components) to that name - Update the records when needed - Note: this option is not officially supported (but since kubelet supports reconnection anyway, it *should* work) .debug[[k8s/apilb.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/apilb.md)] --- ## Option 4: .................... - Many managed clusters expose a high-availability API endpoint (and you don't have to worry about it) - You can also use HA mechanisms that you're familiar with (e.g. virtual IPs) - Tunnels are also fine (e.g. [k3s](https://k3s.io/) uses a tunnel to allow each node to contact the API server) ??? :EN:- Ensuring API server availability :FR:- Assurer la disponibilité du serveur API .debug[[k8s/apilb.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/apilb.md)] --- class: pic .interstitial[] --- name: toc-securing-the-control-plane class: title Securing the control plane .nav[ [Previous part](#toc-api-server-availability) | [Back to table of contents](#toc-part-1) | [Next part](#toc-) ] .debug[(automatically generated title slide)] --- # Securing the control plane - Many components accept connections (and requests) from others: - API server - etcd - kubelet - We must secure these connections: - to deny unauthorized requests - to prevent eavesdropping secrets, tokens, and other sensitive information - Disabling authentication and/or authorization is **strongly discouraged** (but it's possible to do it, e.g. for learning / troubleshooting purposes) .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Authentication and authorization - Authentication (checking "who you are") is done with mutual TLS (both the client and the server need to hold a valid certificate) - Authorization (checking "what you can do") is done in different ways - the API server implements a sophisticated permission logic (with RBAC) - some services will defer authorization to the API server (through webhooks) - some services require a certificate signed by a particular CA / sub-CA .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## In practice - We will review the various communication channels in the control plane - We will describe how they are secured - When TLS certificates are used, we will indicate: - which CA signs them - what their subject (CN) should be, when applicable - We will indicate how to configure security (client- and server-side) .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## etcd peers - Replication and coordination of etcd happens on a dedicated port (typically port 2380; the default port for normal client connections is 2379) - Authentication uses TLS certificates with a separate sub-CA (otherwise, anyone with a Kubernetes client certificate could access etcd!) - The etcd command line flags involved are: `--peer-client-cert-auth=true` to activate it `--peer-cert-file`, `--peer-key-file`, `--peer-trusted-ca-file` .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## etcd clients - The only¹ thing that connects to etcd is the API server - Authentication uses TLS certificates with a separate sub-CA (for the same reasons as for etcd inter-peer authentication) - The etcd command line flags involved are: `--client-cert-auth=true` to activate it `--trusted-ca-file`, `--cert-file`, `--key-file` - The API server command line flags involved are: `--etcd-cafile`, `--etcd-certfile`, `--etcd-keyfile` .footnote[¹Technically, there is also the etcd healthcheck. Let's ignore it for now.] .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## etcd authorization - etcd supports RBAC, but Kubernetes doesn't use it by default (note: etcd RBAC is completely different from Kubernetes RBAC!) - By default, etcd access is "all or nothing" (if you have a valid certificate, you get in) - Be very careful if you use the same root CA for etcd and other things (if etcd trusts the root CA, then anyone with a valid cert gets full etcd access) - For more details, check the following resources: - [etcd documentation on authentication](https://etcd.io/docs/current/op-guide/authentication/) - [PKI The Wrong Way](https://www.youtube.com/watch?v=gcOLDEzsVHI) at KubeCon NA 2020 .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## API server clients - The API server has a sophisticated authentication and authorization system - For connections coming from other components of the control plane: - authentication uses certificates (trusting the certificates' subject or CN) - authorization uses whatever mechanism is enabled (most oftentimes, RBAC) - The relevant API server flags are: `--client-ca-file`, `--tls-cert-file`, `--tls-private-key-file` - Each component connecting to the API server takes a `--kubeconfig` flag (to specify a kubeconfig file containing the CA cert, client key, and client cert) - Yes, that kubeconfig file follows the same format as our `~/.kube/config` file! .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Kubelet and API server - Communication between kubelet and API server can be established both ways - Kubelet → API server: - kubelet registers itself ("hi, I'm node42, do you have work for me?") - connection is kept open and re-established if it breaks - that's how the kubelet knows which pods to start/stop - API server → kubelet: - used to retrieve logs, exec, attach to containers .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Kubelet → API server - Kubelet is started with `--kubeconfig` with API server information - The client certificate of the kubelet will typically have: `CN=system:node:
` and groups `O=system:nodes` - Nothing special on the API server side (it will authenticate like any other client) .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## API server → kubelet - Kubelet is started with the flag `--client-ca-file` (typically using the same CA as the API server) - API server will use a dedicated key pair when contacting kubelet (specified with `--kubelet-client-certificate` and `--kubelet-client-key`) - Authorization uses webhooks (enabled with `--authorization-mode=Webhook` on kubelet) - The webhook server is the API server itself (the kubelet sends back a request to the API server to ask, "can this person do that?") .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Scheduler - The scheduler connects to the API server like an ordinary client - The certificate of the scheduler will have `CN=system:kube-scheduler` .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Controller manager - The controller manager is also a normal client to the API server - Its certificate will have `CN=system:kube-controller-manager` - If we use the CSR API, the controller manager needs the CA cert and key (passed with flags `--cluster-signing-cert-file` and `--cluster-signing-key-file`) - We usually want the controller manager to generate tokens for service accounts - These tokens deserve some details (on the next slide!) .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- class: extra-details ## How are these permissions set up? - A bunch of roles and bindings are defined as constants in the API server code: [auth/authorizer/rbac/bootstrappolicy/policy.go](https://github.com/kubernetes/kubernetes/blob/release-1.19/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L188) - They are created automatically when the API server starts: [registry/rbac/rest/storage_rbac.go](https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/registry/rbac/rest/storage_rbac.go#L140) - We must use the correct Common Names (`CN`) for the control plane certificates (since the bindings defined above refer to these common names) .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Service account tokens - Each time we create a service account, the controller manager generates a token - These tokens are JWT tokens, signed with a particular key - These tokens are used for authentication with the API server (and therefore, the API server needs to be able to verify their integrity) - This uses another keypair: - the private key (used for signature) is passed to the controller manager
(using flags `--service-account-private-key-file` and `--root-ca-file`) - the public key (used for verification) is passed to the API server
(using flag `--service-account-key-file`) .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## kube-proxy - kube-proxy is "yet another API server client" - In many clusters, it runs as a Daemon Set - In that case, it will have its own Service Account and associated permissions - It will authenticate using the token of that Service Account .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Webhooks - We mentioned webhooks earlier; how does that really work? - The Kubernetes API has special resource types to check permissions - One of them is SubjectAccessReview - To check if a particular user can do a particular action on a particular resource: - we prepare a SubjectAccessReview object - we send that object to the API server - the API server responds with allow/deny (and optional explanations) - Using webhooks for authorization = sending SAR to authorize each request .debug[[k8s/control-plane-auth.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/k8s/control-plane-auth.md)] --- ## Subject Access Review Here is an example showing how to check if `jean.doe` can `get` some `pods` in `kube-system`: ```bash kubectl -v9 create -f- <
Questions?  .debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/2024-10-pick/slides/shared/thankyou.md)]