Many people use tools like Terraform, Ansible, custom boot scripts and more to provision and maintain their Kubernetes nodes. These tools are powerful but they do have a downside of only partially achieving what should be the ultimate goal: declarative configuration with an immutable OS. Talos is a great, popular tool that can step in here, but unfortunately Talos is a specific distribution so if you want to customize or install even one thing that is not already there, you need to duplicate Talos' image building pipelines and then maintain your variant in perpetuity. Kairos can help here, as Kairos is a meta-distribution
which means you can use any base you want and indeed, can customize your base image however you want from whatever base you choose. After it has been customized, the OS is immutable, just like Talos.
AuroraBoot is a tool that takes a system image and automatically advertises it to machines that boot on the same network and request an address through DHCP. You may have heard of NetBoot, it's similar to that but simplified and geared towards usage with Kairos. Use your preferred way to run containers - I will continue with an example config that should work on any system with docker
, docker-compose
and systemd
. Use this gist as a guide. Here's a docker-compose
file you can use:
version: "2.1"
services:
auroraboot:
container_name: auroraboot
image: quay.io/kairos/auroraboot:latest
command: --cloud-config /cloud-init.yaml
volumes:
- /services/auroraboot/storage/:/storage
- /services/auroraboot/cloud-init.yaml:/cloud-init.yaml
network_mode: host
restart: unless-stopped
As you can see, you will need to make the directories /services/auroraboot/
and /services/auroraboot/storage
and then create /services/auroraboot/cloud-init.yaml
:
state_dir: "/storage"
artifact_version: v2.4.1-k3sv1.27.3+k3s1
release_version: v2.4.1
flavor: debian
repository: kairos-io/kairos
cloud_config: |
#cloud-config
hostname: kairos-{{ trunc 4 .MachineID }}
users:
- name: tyzbit # changeme
shell: /bin/bash
groups:
- admin
ssh_authorized_keys:
- github:tyzbit # changeme
install:
auto: true
device: /dev/sda # changeme
reboot: true
bundles:
- targets:
- run://docker.io/tyzbit/flux:latest # changeme if https://github.com/kairos-io/community-bundles/pull/53 has been merged
growpart:
devices: ['/']
kubevip:
enabled: true
eip: 192.168.1.8 # changeme
k3s:
enabled: true
args:
- --disable=traefik,servicelb
- --write-kubeconfig-mode 0644
- --node-taint 'node-role.kubernetes.io/control-plane=effect:NoSchedule'
stages:
boot:
- name: "Set up various kube environment variables"
environment:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
CONTAINERD_ADDRESS: /run/k3s/containerd/containerd.sock
CONTAINERD_NAMESPACE: k8s.io
# -- This is needed now so we can add the SOPS secret
- name: "Add flux-system namespace manifest"
files:
- path: /var/lib/rancher/k3s/server/manifests/flux-system.yaml
content: |
apiVersion: v1
kind: Namespace
metadata:
name: flux-system
- name: "Download SOPS secret"
files:
- path: /var/lib/rancher/k3s/server/manifests/sops-secret.yaml
content: |
# changeme
apiVersion: v1
kind: Secret
metadata:
name: sops-gpg
namespace: flux-system
type: Opaque
data:
sops.asc: a2Fpcm9zK3NvcHM=
---
apiVersion: v1
kind: Secret
metadata:
name: kubernetes-secrets
namespace: flux-system
type: Opaque
data:
identity: a2Fpcm9zK3NvcHM=
identity.pub: a2Fpcm9zK3NvcHM=
known_hosts: a2Fpcm9zK3NvcHM=
p2p:
network_id: kairos # changeme
network_token: example # changeme
dns: false
auto:
enable: true
ha:
enable: true
vpn:
create: false
enable: false
# changeme
# review ALL settings
flux:
env:
GITHUB_TOKEN: example
github:
owner: csagan
repository: fleet-infra
path: clusters/cosmos
components-extra: image-reflector-controller,image-automation-controller
Change everything indicated by "changeme" if necessary.
It's not advised to put secrets directly in the config file. Look at an example config or the Kairos documentation to see how you can host the secrets on a local webserver and pull them in.
Generate your network token with docker run -ti --rm quay.io/mudler/edgevpn -b -g
.
Save this file and then start the container (systemctl daemon-reload; systemctl start app@auroraboot
).
The config and the cloud-config can both be URLs so it's advised to commit them to a repo and then change the startup parameters on the AuroraBoot container to point to them, such as "https://raw.githubusercontent.com/tyzbit/kairos-config/main/k3s/auroraboot-config.yaml". That way you can have all of your configs (except sensitive configs) in Git.
Now just boot 2 systems - it can be in parallel or one after another. On the first boot, bring up the one-time boot menu (F12 on Dell machines, for example) and select the network (IPv4). Let it install and reboot, then select the newly installed OS.
Your systems should boot, find each other, choose roles and become a Kubernetes cluster. Then they'll bootstrap with Flux and pull down any manifests you have for the cluster and install them. Once complete, you have a fully configured cluster ready to go!
00:00
I netboot the first machine (192.168.1.50). AuroraBoot (top-right) responds with files to boot from.04:20
Kairos is installed and the host reboots. I select to boot from the new OS.06:14
Kairos starts scrolling console messages on the first machine, I boot and switch to the second (192.168.1.52).10:58
The second machine is done installing and reboots. I select to boot from the new OS.11:45
I start watching kairos-agent
on one of the machines.14:17
Second host is done booting.14:30
Hosts start booting and choosing roles.15:25
The node I'm watching becomes a worker, so I know the other is a control plane node.16:00
I start up k9s to watch k8s stuff get bootstrapped. Flux has already been bootstrapped on the node.19:00
An example site is deployed to the cluster.19:16
I load the site, which used cert-manager, external-dns, NFS storage and cloudflared to serve the site.19:36
I show Longhorn is installed and has automatically been configured to use the additional disk on the node.20:10
I create a Longhorn volume, mount it and create a PVC to demonstrate it's fully installed and working.The manifests it installed can be seen here.