Using Kairos to fully bootstrap a Kubernetes cluster with (almost) zero intervention

kairos

Background and motivation

Many people use tools like Terraform, Ansible, custom boot scripts and more to provision and maintain their Kubernetes nodes. These tools are powerful but they do have a downside of only partially achieving what should be the ultimate goal: declarative configuration with an immutable OS. Talos is a great, popular tool that can step in here, but unfortunately Talos is a specific distribution so if you want to customize or install even one thing that is not already there, you need to duplicate Talos' image building pipelines and then maintain your variant in perpetuity. Kairos can help here, as Kairos is a meta-distribution which means you can use any base you want and indeed, can customize your base image however you want from whatever base you choose. After it has been customized, the OS is immutable, just like Talos.

How to

Requirements

  • A way to run a container in the network in which your hosts will be booting.
    • Running it on a NAS is a great idea, or even a single-board computer.
    • It needs at least 2GB free space if you're only working with one image, more if you're testing, building or using many images.
    • 256MB memory should be plenty.
  • DHCP server. Most networks probably already have one of these.
  • Servers to boot. These can be virtual or physical.
  • A way to control which device the servers boot from, such as a keyboard to bring up a one-time boot menu.

Let's go!

Set up the AuroraBoot container

AuroraBoot is a tool that takes a system image and automatically advertises it to machines that boot on the same network and request an address through DHCP. You may have heard of NetBoot, it's similar to that but simplified and geared towards usage with Kairos. Use your preferred way to run containers - I will continue with an example config that should work on any system with docker, docker-compose and systemd. Use this gist as a guide. Here's a docker-compose file you can use:

version: "2.1"
services:
  auroraboot:
    container_name: auroraboot
    image: quay.io/kairos/auroraboot:latest
    command: --cloud-config /cloud-init.yaml
    volumes:
      - /services/auroraboot/storage/:/storage
      - /services/auroraboot/cloud-init.yaml:/cloud-init.yaml
    network_mode: host
    restart: unless-stopped

As you can see, you will need to make the directories /services/auroraboot/ and /services/auroraboot/storage and then create /services/auroraboot/cloud-init.yaml:

state_dir: "/storage"
artifact_version: v2.4.1-k3sv1.27.3+k3s1
release_version: v2.4.1
flavor: debian
repository: kairos-io/kairos
cloud_config: |
  #cloud-config

  hostname: kairos-{{ trunc 4 .MachineID }}
  users:
    - name: tyzbit # changeme
      shell: /bin/bash
      groups:
        - admin
      ssh_authorized_keys:
        - github:tyzbit # changeme

  install:
    auto: true
    device: /dev/sda # changeme
    reboot: true

  bundles:
    - targets:
        - run://docker.io/tyzbit/flux:latest # changeme if https://github.com/kairos-io/community-bundles/pull/53 has been merged

  growpart:
    devices: ['/']

  kubevip:
    enabled: true
    eip: 192.168.1.8 # changeme

  k3s:
    enabled: true
    args:
      - --disable=traefik,servicelb
      - --write-kubeconfig-mode 0644
      - --node-taint 'node-role.kubernetes.io/control-plane=effect:NoSchedule'

  stages:
    boot:
      - name: "Set up various kube environment variables"
        environment:
          KUBECONFIG: /etc/rancher/k3s/k3s.yaml
          CONTAINERD_ADDRESS: /run/k3s/containerd/containerd.sock
          CONTAINERD_NAMESPACE: k8s.io

      # -- This is needed now so we can add the SOPS secret
      - name: "Add flux-system namespace manifest"
        files:
          - path: /var/lib/rancher/k3s/server/manifests/flux-system.yaml
            content: |
              apiVersion: v1
              kind: Namespace
              metadata:
                name: flux-system

      - name: "Download SOPS secret" 
        files:
          - path: /var/lib/rancher/k3s/server/manifests/sops-secret.yaml
            content: |
              # changeme
              apiVersion: v1
              kind: Secret
              metadata:
                name: sops-gpg
                namespace: flux-system
              type: Opaque
              data:
                sops.asc: a2Fpcm9zK3NvcHM=
              ---
              apiVersion: v1
              kind: Secret
              metadata:
                name: kubernetes-secrets
                namespace: flux-system
              type: Opaque
              data:
                identity: a2Fpcm9zK3NvcHM=
                identity.pub: a2Fpcm9zK3NvcHM=
                known_hosts: a2Fpcm9zK3NvcHM=

  p2p:
    network_id: kairos # changeme
    network_token: example # changeme
    dns: false
    auto:
      enable: true
      ha:
        enable: true
    vpn:
      create: false
      enable: false

  # changeme
  # review ALL settings
  flux:
    env:
      GITHUB_TOKEN: example
    github:
      owner: csagan 
      repository: fleet-infra
      path: clusters/cosmos
      components-extra: image-reflector-controller,image-automation-controller

Change everything indicated by "changeme" if necessary.

It's not advised to put secrets directly in the config file. Look at an example config or the Kairos documentation to see how you can host the secrets on a local webserver and pull them in.

Generate your network token with docker run -ti --rm quay.io/mudler/edgevpn -b -g.

Save this file and then start the container (systemctl daemon-reload; systemctl start app@auroraboot).

The config and the cloud-config can both be URLs so it's advised to commit them to a repo and then change the startup parameters on the AuroraBoot container to point to them, such as "https://raw.githubusercontent.com/tyzbit/kairos-config/main/k3s/auroraboot-config.yaml". That way you can have all of your configs (except sensitive configs) in Git.

Boot

Now just boot 2 systems - it can be in parallel or one after another. On the first boot, bring up the one-time boot menu (F12 on Dell machines, for example) and select the network (IPv4). Let it install and reboot, then select the newly installed OS.

Magic

Your systems should boot, find each other, choose roles and become a Kubernetes cluster. Then they'll bootstrap with Flux and pull down any manifests you have for the cluster and install them. Once complete, you have a fully configured cluster ready to go!

Wanna see it in action?

Timeline

  • 00:00 I netboot the first machine (192.168.1.50). AuroraBoot (top-right) responds with files to boot from.
  • 04:20 Kairos is installed and the host reboots. I select to boot from the new OS.
  • 06:14 Kairos starts scrolling console messages on the first machine, I boot and switch to the second (192.168.1.52).
  • 10:58 The second machine is done installing and reboots. I select to boot from the new OS.
  • 11:45 I start watching kairos-agent on one of the machines.
  • 14:17 Second host is done booting.
  • 14:30 Hosts start booting and choosing roles.
  • 15:25 The node I'm watching becomes a worker, so I know the other is a control plane node.
  • 16:00 I start up k9s to watch k8s stuff get bootstrapped. Flux has already been bootstrapped on the node.
  • 19:00 An example site is deployed to the cluster.
  • 19:16 I load the site, which used cert-manager, external-dns, NFS storage and cloudflared to serve the site.
  • 19:36 I show Longhorn is installed and has automatically been configured to use the additional disk on the node.
  • 20:10 I create a Longhorn volume, mount it and create a PVC to demonstrate it's fully installed and working.

The manifests it installed can be seen here.

Thanks to chkpwd in the Kubernetes@Home discord for his assistance as we both tinkered with Kairos!