Here are 2 ways to set up a Ranche Kubernetes Engine cluster of 3 (or more) nodes.
Minimum requirements
- 3 Ubuntu Server 22.04 up to date and running (4GB RAM / 2 vCPU)
- Ability to run a seashell docker container on your workstation (First install option)
- No Firewall between nodes
- ntp time synchronization (chronyd)
- passwordless ssh & sudo to each host (ssh key installed, ssh-agent running to unlock the key)
- DNS updated and resolution working
- 1 Ubuntu Server 22.04 external load balancer (1GB RAM / 1vCPU) (Second install option only)
Rancher made RKE2 straightforward to install. Here is an example configuration. For details, refer to docs.rke2.io
The LAB description
Hostname | IP Address | vCPU | RAM (GB) |
proxy.home.pivert.org | 192.168.66.20 | 1 | 1 |
rke21.home.pivert.org | 192.168.66.21 | 2 | 4 |
rke22.home.pivert.org | 192.168.66.22 | 2 | 4 |
rke23.home.pivert.org | 192.168.66.23 | 2 | 4 |
Sanity checks
Before going further, let’s make sure you successfully run the below tests from your workstation. (Adapt IPs)
for x in 192.168.66.2{0..3}; do ping -W1 -c1 $x > /dev/null && echo "$x : OK" || echo "$x : Not OK"; done; for x in proxy rke2{1..3}; do ping -W1 -c1 $x > /dev/null && echo "$x : OK" || echo "$x : Not OK"; done;
First option: Ansible Galaxy
Example 1: Deploy a 3 nodes cluster
The Ansible Galaxy RKE2 Role will set up a multi-nodes cluster in minutes.
It will also install keepalived
on the masters to maintain a virtual IP address (proxy) between the nodes, so you do not need any external haproxy
nor external keepalived
. Make sure the virtual IP for proxy (192.168.66.20 in the example) is free.
Start a seashell
seashell container offers a Bash and Neovim environment with all the common kubernetes tools (kubectl, oc, k9s, helm, …) and Ansible.
seashell will mount the current folder for persistence (your config files). Also, if you need ssh-agent
, run it before starting seashell or eval $(ssh-agent -s)
As user, create a new folder and cd
into it.
curl -O https://gitlab.com/pivert/seashell/-/raw/main/seashell && chmod a+x seashell && seashell
Install the `lablabs/rke2` role.
ansible-galaxy install lablabs.rke2
Create the 2 configuration files
The inventory hosts
file
[masters] rke21 rke22 rke23 [k8s_cluster:children] masters
Just make sure you keep the group names in the inventory file.
The deploy_rke2.yaml
playbook
- name: Deploy RKE2 hosts: k8s_cluster become: yes vars: rke2_ha_mode: true rke2_api_ip: 192.168.66.20 rke2_download_kubeconf: true roles: - role: lablabs.rke2
Run the playbook
ansible-playbook -i hosts deploy_rke2.yaml
Check
From seashell
mv /tmp/rke2.yaml ./ export KUBECONFIG=$(pwd)/rke2.yaml k9s
Example 2: Deploy a 6 nodes cluster with 3 masters and 3 workers
hosts
[masters] master1 rke2_type=server master2 rke2_type=server master3 rke2_type=server [workers] worker1 rke2_type=agent worker2 rke2_type=agent worker3 rke2_type=agent [k8s_cluster:children] masters workers
deploy_rke2.yaml playbook
- name: Deploy RKE2 hosts: k8s_cluster become: yes vars: rke2_version: v1.27.3+rke2r1 rke2_ha_mode: true rke2_ha_mode_keepalived: false rke2_ha_mode_kubevip: true rke2_api_ip: 192.168.66.60 rke2_download_kubeconf: true rke2_additional_sans: - rke.home.pivert.org - 192.168.66.60 rke2_loadbalancer_ip_range: 192.168.66.100-192.168.66.200 rke2_server_node_taints: - "CriticalAddonsOnly=true:NoExecute" roles: - role: lablabs.rke2
group_vars
Create a ./group_vars/ subfolder with 3 files, and provide the rke2_token and rke2_argent_token.
The rke2_token for the workers must be equal to the rke2_agent_token for the masters. This is an additional security feature to ensure the workers do not need the token reserved for masters joining.
First set your global variables in all.yml
--- ansible_user: pivert rke2_token: ""
Generate 2 tokens, and place them in a masters.yml
file.
--- rke2_token: EQKZhm2klhTE0G3WGjlrSB8pHNejRWZlH4oW7y8mCW9xZN13OTMw7BF10mXdBPLN rke2_agent_token: MIJH9VSJ4Pu3YTavEGOzClkLvvApvspjHCd4fugVEgSkJ0YQlha8ha6RcvOdMyZv
Create the workers.yml
with only the “agent token” as rke2_token
.
--- rke2_token: MIJH9VSJ4Pu3YTavEGOzClkLvvApvspjHCd4fugVEgSkJ0YQlha8ha6RcvOdMyZv
Run the playbook
ansible-playbook -i hosts deploy_rke2.yaml
Check
After 15-45 minutes :
mv /tmp/rke2.yaml ./ export KUBECONFIG=$(pwd)/rke2.yaml k9s
Troubleshoot
If the cluster does not come up 30 minutes after everything has been downloaded (monitor your internet download usage), you can use crictl to ensure all your pods are running.
Create a crictl.yaml
file locally containing
runtime-endpoint: unix:///run/k3s/containerd/containerd.sock
Then use ansible to copy the file, and to install cri-tools :
ansible k8s_cluster -m copy -a "dest=/etc/crictl.yaml content='runtime-endpoint: unix:///run/k3s/containerd/containerd.sock\n'" -b VERSION="v1.28.0" ansible k8s_cluster -m shell -a "wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-$VERSION-linux-amd64.tar.gz" ansible k8s_cluster -m shell -a "tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin" -b ansible k8s_cluster -m shell -a "rm -f crictl-$VERSION-linux-amd64.tar.gz"
crictl
uses a syntax very close to docker
. So command like this will show you all the Pods:
ansible k8s_cluster -a 'crictl ps'
Feel free to check logs with crictl
if the cluster did not initialize and you did not get the /tmp/rke2.yaml file.
Scratch the cluster
ansible k8s_cluster -a /usr/local/bin/rke2-killall.sh ansible k8s_cluster -a /usr/local/bin/rke2-uninstall.sh ansible k8s_cluster -a 'rm /etc/systemd/system/rke2-server.service' ansible k8s_cluster -a 'systemctl daemon-reload'
The 2 last steps above to remove the service are important, since there is a bug
Second option: Manual
This option is described in the docs.rke2.io documentation.
Setup your haproxy or loadbalancer
This option does not describe how to run keepalived on the hosts, so you need an external loadbalancer.
Feel free to use any loadbalancer, or DNS round . Here is an example for haproxy:
global log 127.0.0.1 local2 pidfile /var/run/haproxy.pid maxconn 4000 daemon defaults mode http log global option dontlognull option http-server-close option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 # RKE2 listen rke2-api-server-6443 bind 192.168.66.20:6443 mode tcp option httpchk HEAD /readyz http-check expect status 200 option ssl-hello-chk server rke21 192.168.66.21:6443 check inter 1s server rke22 192.168.66.22:6443 check inter 1s server rke23 192.168.66.23:6443 check inter 1s listen rke2-machine-config-server-9345 bind 192.168.66.20:9345 mode tcp server rke21 192.168.66.21:9345 check inter 1s server rke22 192.168.66.22:9345 check inter 1s server rke23 192.168.66.23:9345 check inter 1s listen rke2-ingress-router-80 bind 192.168.66.20:80 mode tcp balance source server rke21 192.168.66.21:80 check inter 1s server rke22 192.168.66.22:80 check inter 1s server rke23 192.168.66.23:80 check inter 1s listen rke2-ingress-router-443 bind 192.168.66.20:443 mode tcp balance source server rke21 192.168.66.21:443 check inter 1s server rke22 192.168.66.22:443 check inter 1s server rke23 192.168.66.23:443 check inter 1s
This might be the most important part. Also make sure you do not have any firewall in the way or carefully watch for DROP or REJECTS.
This is not mandatory for this tutorial, but if you’re using haproxy, configure 2 in HA with Keepalived.
Ignore CNI-managed interfaces (NetworkManager)
Configure NetworkManager to ignore calico/flannel related network interfaces on the 3 nodes
cat <<EOF > /var/lib/rancher/rke2/server/node-token [keyfile] unmanaged-devices=interface-name:cali*;interface-name:flannel* EOF systemctl reload systemd-networkd.service
Reload
systemctl reload NetworkManager
Install the first node
mkdir -p /etc/rancher/rke2/ cat <<EOF > /etc/rancher/rke2/config.yaml token: $(cat /proc/sys/kernel/random/uuid) tls-san: - proxy.home.pivert.org EOF
Install
curl -sfL https://get.rke2.io | sh -
Start
systemctl enable rke2-server.service --now
Activation should take 3-10 minutes since it will download container images. Watch the log
journalctl -u rke2-server.service -f
Check
kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get pods -A
Install and start other nodes
Proceed one node at a time. Don’t go to the next node as long as you do not see the second node in “Ready” state with
kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes
Copy the config.yaml and add the server line
Copy the above file to other nodes, and add the server on the first line. Make sure the token is the same on all nodes. The file should look like this:
server: https://proxy.home.pivert.org:9345 token: e0ea0aed-ccb8-4770-880a-2cc49175c0a2 tls-san: - proxy.home.pivert.org
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service --now
Results
You should get something like this
root@rke21:~# kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes NAME STATUS ROLES AGE VERSION rke21 Ready control-plane,etcd,master 55m v1.25.9+rke2r1 rke22 Ready control-plane,etcd,master 29m v1.25.9+rke2r1 rke23 Ready control-plane,etcd,master 12m v1.25.9+rke2r1 root@rke21:~# kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system cloud-controller-manager-rke21 1/1 Running 0 56m kube-system cloud-controller-manager-rke22 1/1 Running 0 29m kube-system cloud-controller-manager-rke23 1/1 Running 0 13m kube-system etcd-rke21 1/1 Running 0 55m kube-system etcd-rke22 1/1 Running 0 29m kube-system etcd-rke23 1/1 Running 0 12m kube-system helm-install-rke2-canal-rq7m8 0/1 Completed 0 56m kube-system helm-install-rke2-coredns-85d7v 0/1 Completed 0 56m kube-system helm-install-rke2-ingress-nginx-c78xr 0/1 Completed 0 56m kube-system helm-install-rke2-metrics-server-pfzln 0/1 Completed 0 56m kube-system helm-install-rke2-snapshot-controller-2kj8x 0/1 Completed 1 56m kube-system helm-install-rke2-snapshot-controller-crd-qpbm8 0/1 Completed 0 56m kube-system helm-install-rke2-snapshot-validation-webhook-kk7bc 0/1 Completed 0 56m kube-system kube-apiserver-rke21 1/1 Running 0 56m kube-system kube-apiserver-rke22 1/1 Running 0 29m kube-system kube-apiserver-rke23 1/1 Running 0 13m kube-system kube-controller-manager-rke21 1/1 Running 0 56m kube-system kube-controller-manager-rke22 1/1 Running 0 29m kube-system kube-controller-manager-rke23 1/1 Running 0 13m kube-system kube-proxy-rke21 1/1 Running 0 56m kube-system kube-proxy-rke22 1/1 Running 0 29m kube-system kube-proxy-rke23 1/1 Running 0 13m kube-system kube-scheduler-rke21 1/1 Running 0 56m kube-system kube-scheduler-rke22 1/1 Running 0 29m kube-system kube-scheduler-rke23 1/1 Running 0 13m kube-system rke2-canal-774ps 2/2 Running 0 30m kube-system rke2-canal-fftn5 2/2 Running 0 55m kube-system rke2-canal-mn24r 2/2 Running 0 13m kube-system rke2-coredns-rke2-coredns-6b9548f79f-cl9zt 1/1 Running 0 55m kube-system rke2-coredns-rke2-coredns-6b9548f79f-lpwzd 1/1 Running 0 30m kube-system rke2-coredns-rke2-coredns-autoscaler-57647bc7cf-6xxhj 1/1 Running 0 55m kube-system rke2-ingress-nginx-controller-nll9j 1/1 Running 0 12m kube-system rke2-ingress-nginx-controller-nsjgf 1/1 Running 0 29m kube-system rke2-ingress-nginx-controller-s2j8v 1/1 Running 0 54m kube-system rke2-metrics-server-7d58bbc9c6-98j9s 1/1 Running 0 54m kube-system rke2-snapshot-controller-7b5b4f946c-mw9l9 1/1 Running 0 54m kube-system rke2-snapshot-validation-webhook-7748dbf6ff-dzmjf 1/1 Running 0 54m
Troubleshoot
Check your certificate, and especially the «X509v3 Subject Alternative Name»
openssl s_client -connect localhost:6443 < /dev/null | openssl x509 -text
Restart the installation or recreate the node.
cp /etc/rancher/rke2/config.yaml /root # Save your config.yaml since it will be deleted by the uninstall below # You must delete the node from the cluster to be able to re-join kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml drain $HOSTNAME --delete-emptydir-data --force --ignore-daemonsets kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml delete node $HOSTNAME systemctl disable rke2-server.service --now rke2-killall.sh rke2-uninstall.sh mkdir -p /etc/rancher/rke2/ cp /root/config.yaml /etc/rancher/rke2/ curl -sfL https://get.rke2.io | sh - systemctl enable rke2-server.service --now journalctl -u rke2-server.service -f
The kubectl delete might hang, just <ctrl>+c once. The node delete/creation takes about 3-5 minutes.
What’s next ?
Install Rancher to get a GUI
Make sure you’re in the above-mentioned seashell, and have proper KUBECONFIG environment variable. For instance:export KUBECONFIG=~/rke2.yaml
Or add or copy the rke2.yaml
content to your ~/.kube/config
From the seashell, you should be able to run aliases such as kgpoall
(800 aliases starting by k…)
helm repo add jetstack https://charts.jetstack.io helm repo update helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.11.0 helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=proxy.home.pivert.org --set bootstrapPassword=admin --values values.yaml --set global.cattle.psp.enabled=false echo https://proxy.home.pivert.org/dashboard/?setup=$(kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}') # =>Click on the link after 5-10 minutes
Also make sure you have enough RAM & vCPU to run your workloads. Think about minimum 16 GB RAM & 4vCPU.
Conclusion
RKE2 is very easy to set up, and the documentation is great. Both options allows you to install a cluster in minutes. Check the documentation for more.
2 responses to “RKE2 cluster on Ubuntu 22.04 in minutes – Ansible Galaxy and Manual options (2 ways)”
How about installing onto a single server which acts as server and agent?
The officiat documentations doesnt explain it well!
Of course, you can always install both server and agent (or master and worker) on a single server. You’ll need 3 servers instead of 6. The reason to split servers and agent is to prevent a high workload on an agent to impact the controlplane and/or etcd.