Configuring BGP With Calico on k8s and OPNSense

felix_icon

The Problem

If you're like me, the prospect of being able to hit Kubernetes resources from your local network (and expose them to the Internet to host apps, games and other services) is very alluring. There are many ways to do this, one of the most popular being MetalLB. MetalLB is a fantastic app, and it served me well for years but I always wanted to do things the "right way". In the "easy mode" of MetalLB, it announces interfaces using ARP and connects those up with Kubernetes services of type LoadBalancer. Even though it's using the technology in a novel way, it's very stable. However, it has some downsides:

  1. You must give it blocks of IPs from your existing IP space that the nodes are part of.
    • As a homelabber, I only had 254 IPs in my space to begin with (I did not plan ahead and just used 192.168.1.0/24 for my local network and eventually I allocated at least 80 static devices/interfaces, which were a good amount of IoT, personal devices, LAN/WIFI for the same device, etc). Although we should plan our networks out, I think a lot of homelabbers might find themselves in a similar situation.
    • I chose a block of 20 IPs right after the IPs I had allocated, so I was right at 100 devices and I configured DHCP to give out ephemeral leases at .101, so that meant I could only have at most 20 load balancers unless I gave it another chunk or changed the chunk. I could change the ephemeral lease range but that just buys time, it doesn't allow true growth.
    • I had to be strategic and I made heavy use of the metallb.universe.tf/allow-shared-ip annotation to let me listen on TCP and UDP on the same LB so I never had any app using more than one LoadBalancer. Sidenote: as I understand, Kubernetes 1.24 lets you do TCP and UDP on LoadBalancers natively, but it was nice to be able to do that before that became possible
  2. When you give it blocks of your IP space, MetalLB becomes a network management device the other network management devices will not know about, so you have to accommodate it manually and manually make them work together.
  3. MetalLB needs to be running to work, in addition to your existing CNI (Flannel, Calico, Cilium, etc). Every node (including control plane nodes) has to run a speaker pod.

A solution

An alternative to this is to use BGP, which is a ubiquitous protocol that is commonly used to route traffic on the Internet, but is also used within organizations to peer disparate networks (which is different than what most routers do, because they essentially hide the entire network behind them and keep track of where the incoming traffic it receives should go). My CNI, Calico, supports BGP and so does my router OS, OPNSense (if you install the os-frr plugin). The problem is that there isn't much on the Internet that really covers this specific use case (Calico on k8s + BGP + OPNSense. Hopefully this helps someone else with the same goal copy and tweak rather than stumble through configs (and spend a lot of time on dead-ends, in my case). Eventually I learned that this is actually quite easy and as a result, I no longer need MetalLB🎉. This indicates Calico and Cilium use BGP, but development in this space is rapid, so your favorite CNI might now support BGP as well.

What do you get?

Every ClusterIP in my cluster is reachable from my local network natively. For example, if I run an nginx Deployment and create a Service of type ClusterIP pointing to the Deployment, I can take the ClusterIP Kubernetes gave the service and put it in my laptop's web browser (while on the local network) and nginx will respond with the page. Making it available over a VPN hosted by OPNSense is pretty straightforward to do as well so you could hit the services while on your own VPN, but that's out of scope of this writeup.

How To

Prerequisites / Assumptions

I am running Calico and my Kubernetes IP range does not conflict with any range on my router. I actually changed my range, which was a lot of work it turns out, but the default Kubernetes config from kubeadm still gives you something like 192.168.128.0/17 which is quite far from the IP ranges most consumer routers use (usually 192.168.0.0/24 or 192.168.1.0/24). It's up to you to make sure the range doesn't conflict.

Install os-frr on OPNsense

This part is easy. Log into your OPNsense router and navigate to System -> Firmware -> Plugins and then once the list loads, install os-frr. Newer versions of OPNSense appear to require you to upgrade to the latest version in order to install plugins, probably to increase compatibility and reduce developer churn on solved issues or incompatible versions, so be prepared to upgrade (and reboot 😰).

Create Calico Kubernetes Resources

This part is also easy, but it's not copy/paste, it's copy/paste/replace. Apply the following manifests however you usually apply them (for example directly, or by committing them to a repo that Flux or ArgoCD reconciles on your cluster automatically, which is an implementation of GitOps):

apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  asNumber: 64513
  nodeToNodeMeshEnabled: true
  serviceClusterIPs:
    - cidr: 172.18.64.0/18
  serviceExternalIPs:
    - cidr: 172.18.128.0/18
  serviceLoadBalancerIPs:
    - cidr: 172.18.192.0/18
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: YOUR_ROUTER
spec:
  peerIP: YOUR_ROUTER_IP
  asNumber: 64512
  keepOriginalNextHop: true
  maxRestartTime: 15m
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: bgp-pods
spec:
  disabled: false
  blockSize: 0
  cidr: 172.18.0.0/18
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: CrossSubnet
---

I want to point out some things:

  • asNumber, wherever you see it, is sort of like an ID for systems using BGP. Below 64512 are the blocks for public systems on the Internet using BGP, so we should not use any number below that. We can use any numbers between 64512 and 65534 though, which is over a thousand free AS Numbers. I chose to have my router as 64512 and my cluster as 64513, but any numbers work as long as they are in the private range and not the same between systems. As an aside if you don't set it, Calico's default is 64512.
  • I provided my IP ranges, after I redid my networking. If you have the default Kubernetes networking as installed via kubeadm, your ranges will be different. Luckily, I can tell you how to figure out those ranges yourself. Check pods running on your control plane node(s) in the kube-system namespace and look for any pod like kube-controller-manager-NODENAME where NODENAME is the name of one of your control plane nodes. Everywhere you see 172.18.64/18 in the manifests above, replace that with the value in the --service-cluster-ip-range command parameter of the pod. Everywhere you see 172.18.0.0/18, replace that with the --cluster-cidr value in the command parameter of the pod. Finally, specifically for serviceLoadBalancerIPs, you can leave it out unless you plan on having another daemon provide LoadBalancer IPs (because Calico doesn't do that). Calico and MetalLB play pretty nicely together, but that's outside the scope of this guide.
  • IMPORTANT: If you have default networking (most likely scenario), you should not create the IPPool resource, one already exists and you don't have to do anything.

You should also create one of these for every node in your cluster (including Control Plane nodes) so you can check the status without having to use the Calico CLI:

apiVersion: projectcalico.org/v3
kind: CalicoNodeStatus
metadata:
  name: NODE_NAME_HERE
spec:
  classes:
    - Agent
    - BGP
    - Routes
  node: NODE_NAME_HERE
  updatePeriodSeconds: 10
---

Configure OPNSense

This part will have pictures! First, assuming you installed os-frr correctly, you can go to Routing -> BGP in OPNsense. This is what mine looks like, but I will explain how it will probably be different than what you have:

BGPConfig

The networks in Network should be the networks your router can route. For most people, this will probably just be 192.168.1.0/24 or 192.168.0.0/24. If you have a VPN though, you can put the VPN network and it will let you log onto the VPN and be able to access Kubernetes resources! Also note the BGP AS Number - this must match exactly the asNumber from the BGPPeer resource you created.

Now navigate to the Neighbors tab at the top and click the plus button to add a new neighbor.

Neighbor

Here, fill out the information for your nodes, creating a new neighbor for each. Make sure to validate the correct IP and again, this time you should use the asNumber of the BGPConfiguration resource you created earlier. Check Next-Hop-Self (This directs your router to advertise routes directly to it, rather than calculating it itself, as I understand it) and BFD which is a feature that helps detect and route around failures (if a node restarts or is disconnected, for example). Make sure the Update-Source Interface is correct too, it should be the interface that is on the same network as your Kubernetes nodes.

Repeat that for every node you have, control plane and worker.

Neighbors

This is about what it should look like.

If you have already checked enable on the General tab, then click the restart icon in the top right. Otherwise, navigate to General, enable the service and click Save.

Does it work?

You can run this command to see if there are any configured BGP sessions not established: kubectl get caliconodestatus -o jsonpath="{range .items[*]}{.metadata.name}: {.status.bgp.numberNotEstablishedV4}{'\n'}{end}"

The output looks like this:

anchorman: 0
cameraman: 0
crewman: 0
doorman: 0
gravemind: 0
hivemind: 0
mastermind: 0
stuntman: 0
weatherman: 0

This output means that there are no unestablished BGP sessions, which means it should work! If any of these are greater than zero, kubectl describe the caliconodestatus for that node and troubleshoot from there. One possible issue is that you didn't configure the Neighbor correctly in OPNSense.

If everything has been configured correctly, you should now be able to hit any ClusterIP for any service on your cluster and it should load! If you have services that are LoadBalancer, you can change them to ClusterIP or NodePort to free up that IP and just use the ClusterIP in your OPNSense firewall rules and so forth. Or leave them as LoadBalancer, that should still work fine if you're using MetalLB because it only uses its own config and nothing has changed that would prevent it from still announcing LoadBalancer IPs. With that said, if you still want to use MetalLB, check this howto to make sure everything's configured the way it should.

Do be aware that if you don't specify a ClusterIP in your manifest (from what I've seen, it's not common to), if you recreate the service it will get a new IP and firewall rules and the like will need to be updated. As long as the service exists, though, it will keep that IP.