k3s on Two VMs: Cilium, APISIX, and Longhorn for a Test Cluster

Published: 2026-06-21

Single-node k3s is a convenient starting point: one server, the whole cluster, no networking compromises. But the moment you need to verify workload behavior during node failure, storage replication semantics, or ingress behavior during pod drift — single-node stops being an honest test. Moving to two nodes enables exactly those scenarios at minimal cost.

This post covers the key decisions when doing this migration, using the same stack already running on production: Cilium as CNI and load balancer (no MetalLB), APISIX as ingress, Longhorn as distributed storage.

Why Single-Node Becomes Insufficient

Single-node k3s is fine for:

Development and local configuration validation
CI/CD pipelines where state doesn't matter
Deploying stateless applications

It breaks the test as soon as you need to:

Verify Deployment behavior when a node is killed
Test ReadWriteMany storage (a PVC mounted on two pods simultaneously)
Confirm that Cilium L2 correctly switches ARP when a pod migrates
Test PodDisruptionBudgets under real conditions

Two nodes cover all of these scenarios while keeping the same stack as production.

Cluster Topology

For a test cluster, the optimal setup is 1 server + 1 agent:

┌─────────────────────────────────────────────────────────┐
│  VM1 (server)                     VM2 (agent)           │
│  192.168.1.10                     192.168.1.11          │
│                                                         │
│  ┌─────────────────┐              ┌─────────────────┐   │
│  │ kube-apiserver  │              │                 │   │
│  │ kube-scheduler  │◄────────────►│  kubelet        │   │
│  │ kube-controller │    6443/TCP  │  Cilium agent   │   │
│  │ etcd            │              │  (kube-proxy    │   │
│  │ kubelet         │              │   replacement)  │   │
│  │ Cilium agent    │              │                 │   │
│  └─────────────────┘              └─────────────────┘   │
│                                                         │
│  ← — — — — — — — — L2 network / same subnet — — — — → │
└─────────────────────────────────────────────────────────┘

The server node acts as both control plane and worker. The agent is worker-only. For a test cluster this is fine — on production you'd separate them with taints.

Network requirements:

Both nodes on the same L2 network (required for Cilium L2 Announcements)
Port 6443/TCP open from agent to server
Port 4240/TCP (Cilium health check) open between nodes
For Longhorn — port 9500/TCP between nodes

Installing k3s

Node Preparation

On both nodes before installation:

bash# Disable swap — mandatory for kubelet
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

# Load required kernel modules
modprobe overlay
modprobe br_netfilter
cat <<EOF > /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

# sysctl for Kubernetes networking
cat <<EOF > /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
net.ipv4.conf.all.rp_filter         = 0
net.ipv4.conf.default.rp_filter     = 0
fs.inotify.max_user_watches         = 524288
fs.inotify.max_user_instances       = 512
EOF
sysctl --system

rp_filter=0 is required for Cilium L2 Announcements. In strict mode (1), the kernel drops load balancer response packets because they leave on a different interface than they arrived on.

Installing the Server Node

k3s starts without Flannel and without kube-proxy — both are replaced by Cilium:

bashcurl -sfL https://get.k3s.io | sh -s - server \
  --flannel-backend=none \
  --disable-kube-proxy \
  --disable=traefik \
  --disable=servicelb \
  --node-ip=192.168.1.10 \
  --advertise-address=192.168.1.10

Flags:

--flannel-backend=none — don't install Flannel CNI; Cilium takes this role
--disable-kube-proxy — Cilium replaces kube-proxy via eBPF
--disable=traefik — remove default ingress, install APISIX instead
--disable=servicelb — remove built-in Klipper LB; Cilium announces LoadBalancer IPs
--node-ip and --advertise-address — explicitly set the IP, otherwise k3s may pick the wrong interface

Use config.yaml instead of CLI flags — easier to update:

yaml# /etc/rancher/k3s/config.yaml (on server node)
flannel-backend: "none"
disable-kube-proxy: true
disable:
  - traefik
  - servicelb
node-ip: "192.168.1.10"
advertise-address: "192.168.1.10"

After starting, the node will be in NotReady — expected until Cilium is installed:

bashk3s kubectl get nodes
# NAME   STATUS     ROLES                  AGE   VERSION
# vm1    NotReady   control-plane,master   30s   v1.32.x+k3s1

Do not wait for Ready before installing Cilium — Cilium will transition the node to Ready after it starts.

Get Token and kubeconfig

bash# Token for joining the agent node
cat /var/lib/rancher/k3s/server/node-token

# kubeconfig for managing from a local machine
cat /etc/rancher/k3s/k3s.yaml | sed 's/127.0.0.1/192.168.1.10/' > ~/k3s-test.yaml
export KUBECONFIG=~/k3s-test.yaml

Install Cilium (Before Joining Agent)

Cilium must be installed before joining the agent node — otherwise the agent will hang in NotReady:

bashhelm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=192.168.1.10 \
  --set k8sServicePort=6443 \
  --set ipam.mode=kubernetes \
  --set operator.replicas=1 \
  --set socketLB.enabled=true \
  --set socketLB.hostNamespaceOnly=true \
  --set nodePort.enabled=true \
  --set hostPort.enabled=true \
  --set l2announcements.enabled=true \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

k8sServiceHost must be the real IP of the server node, not 127.0.0.1. Cilium running without kube-proxy needs to find the API server itself; localhost creates a routing loop here.

operator.replicas=1 — for a two-node cluster, so the Cilium operator doesn't wait for a second node for HA.

After installation, the server node transitions to Ready:

bashkubectl get nodes
# NAME   STATUS   ROLES                  AGE   VERSION
# vm1    Ready    control-plane,master   3m    v1.32.x+k3s1

cilium status
# KubeProxyReplacement: True
# Cilium: 2/2 agents running

Joining the Agent Node

bash# On VM2
curl -sfL https://get.k3s.io | K3S_URL=https://192.168.1.10:6443 \
  K3S_TOKEN="$(cat /var/lib/rancher/k3s/server/node-token)" \
  sh -s - agent \
  --node-ip=192.168.1.11

Or via config.yaml:

yaml# /etc/rancher/k3s/config.yaml (on agent node)
server: "https://192.168.1.10:6443"
token: "K107abc...::server:abc123token..."
node-ip: "192.168.1.11"

After joining, Cilium automatically deploys an agent on VM2:

bashkubectl get nodes
# NAME   STATUS   ROLES                  AGE    VERSION
# vm1    Ready    control-plane,master   5m     v1.32.x+k3s1
# vm2    Ready    <none>                 30s    v1.32.x+k3s1

kubectl -n kube-system get pods -l app.kubernetes.io/name=cilium
# NAME            READY   STATUS    RESTARTS
# cilium-xxxxx    1/1     Running   0         vm1
# cilium-yyyyy    1/1     Running   0         vm2

Load Balancer: Cilium L2 Announcements

Since CNI is already Cilium, no separate MetalLB is needed. Cilium announces LoadBalancer IPs via ARP on the L2 network.

IP Address Pool

yaml# cilium-lb.yaml
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: default-pool
spec:
  blocks:
    - cidr: "192.168.1.100/28"   # 14 addresses: .101 — .114
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: default-policy
spec:
  interfaces:
    - ^eth[0-9]+    # all eth interfaces
  loadBalancerIPs: true
  externalIPs: true

bashkubectl apply -f cilium-lb.yaml

The pool CIDR must be in the same subnet as the node interfaces — otherwise the switch won't accept ARP replies. Choose a range outside your router's DHCP pool.

Pin a Specific IP to a Service

yamlmetadata:
  annotations:
    "lbipam.cilium.io/ips": "192.168.1.101"

Verify

bash# Test service
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=LoadBalancer

kubectl get svc nginx
# NAME    TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)
# nginx   LoadBalancer   10.43.12.5    192.168.1.101   80:30080/TCP

# Cilium announces the IP via ARP
cilium l2announce list

# Reachable from any machine on LAN
curl http://192.168.1.101/

Cilium elects one leader node to respond to ARP for each IP. If that node goes down, another takes over within a few seconds (controlled by leaseDuration in the policy).

Why Not MetalLB

MetalLB works with any CNI and is a solid standalone component. But when CNI is already Cilium — MetalLB duplicates functionality, adds another Helm chart, and another set of CRDs. Cilium L2 Announcements solves the same problem within the already-installed component, with Hubble providing traffic observability on top.

Ingress: APISIX

k3s runs with --disable=traefik. APISIX Ingress Controller replaces it — the same one running in production.

bashhelm repo add apisix https://charts.apiseven.com
helm install apisix apisix/apisix \
  --namespace apisix \
  --create-namespace \
  --set service.type=LoadBalancer \
  --set ingress-controller.enabled=true \
  --set ingress-controller.config.apisix.serviceNamespace=apisix

service.type=LoadBalancer — APISIX immediately requests an external IP from the Cilium pool.

After installation:

bashkubectl -n apisix get svc apisix-gateway
# NAME              TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)
# apisix-gateway    LoadBalancer   10.43.5.10    192.168.1.102   80:30080/TCP,443:30443/TCP

Example ApisixRoute:

yamlapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: my-app
  namespace: default
spec:
  http:
    - name: main
      match:
        hosts:
          - myapp.example.com
        paths:
          - "/*"
      backends:
        - serviceName: my-app
          servicePort: 8080

StorageClass: Longhorn

The Problem with local-path in Multi-Node

k3s installs local-path StorageClass by default. On single-node — it works. On two nodes — it creates an invisible trap.

local-path creates PVs on the node where the pod was scheduled. If the pod moves to another node — the PV stays on the first one, and the pod hangs:

Events:
  Warning  FailedScheduling  pod/my-app  0/2 nodes are available:
  1 node(s) had volume node affinity conflict.

This never surfaces on single-node. On two nodes, it appears on any kubectl drain or automatic reschedule.

Keep local-path only for truly ephemeral data that can be recreated from scratch.

Longhorn: Distributed Block Storage

Longhorn stores data on the local disks of both nodes and replicates between them:

┌──────────────────────┐     ┌──────────────────────┐
│  VM1 (server)        │     │  VM2 (agent)          │
│                      │     │                       │
│  /var/lib/longhorn   │◄───►│  /var/lib/longhorn    │
│  Replica A           │     │  Replica B            │
│                      │     │                       │
└──────────────────────┘     └──────────────────────┘

If VM2 goes down — data remains on VM1, the pod keeps running with one replica. When VM2 recovers, Longhorn automatically rebuilds the second replica.

Prepare Nodes

bash# On both nodes
apt install open-iscsi util-linux nfs-common
modprobe iscsi_tcp
echo 'iscsi_tcp' >> /etc/modules-load.d/iscsi.conf
systemctl enable --now iscsid

Install

bashhelm repo add longhorn https://charts.longhorn.io
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=2 \
  --set defaultSettings.storageMinimalAvailablePercentage=10

Verify:

bashkubectl -n longhorn-system get node.longhorn.io
# NAME   READY   ALLOWSCHEDULING   SCHEDULABLE   AGE
# vm1    True    True              True          5m
# vm2    True    True              True          4m

kubectl get storageclass
# NAME                 PROVISIONER          ...
# longhorn (default)   driver.longhorn.io   ...
# local-path           rancher.io/local-path ...

Usage

yamlapiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Longhorn UI via port-forward or ApisixRoute:

bashkubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# http://localhost:8080 — volumes, replicas, node state

If You Need ReadWriteMany

Longhorn supports only ReadWriteOnce out of the box. For ReadWriteMany — NFS Provisioner on top of an NFS server:

bash# NFS server on VM1
apt install nfs-kernel-server
mkdir -p /srv/nfs/k8s
echo "/srv/nfs/k8s  192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -ra
systemctl enable --now nfs-kernel-server

# NFS client on VM2
apt install nfs-common

# Provisioner in cluster
helm repo add nfs-subdir-external-provisioner \
  https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-provisioner \
  nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
  --namespace nfs-provisioner \
  --create-namespace \
  --set nfs.server=192.168.1.10 \
  --set nfs.path=/srv/nfs/k8s \
  --set storageClass.name=nfs \
  --set storageClass.defaultClass=false

NFS doesn't replicate data — it's not a replacement for Longhorn for stateful workloads. Use NFS only where RWX is genuinely required: shared uploads, config shared across replicas.

Final Stack

Component	Choice
k3s	server + agent, `--flannel-backend=none --disable-kube-proxy`
CNI	Cilium (kube-proxy replacement)
Load Balancer	Cilium L2 Announcements + CiliumLoadBalancerIPPool
Ingress	APISIX Ingress Controller
StorageClass / block	Longhorn (2 replicas)
StorageClass / shared	NFS Provisioner (if RWX is needed)
Network observability	Hubble (built into Cilium)

Deployment order:

bash# 1. Prepare both nodes (sysctl, swap, kernel modules, open-iscsi)
# 2. Install k3s server (with config.yaml)
# 3. Install Cilium via Helm
# 4. Join k3s agent
# 5. Create CiliumLoadBalancerIPPool and CiliumL2AnnouncementPolicy
# 6. Install Longhorn
# 7. Install APISIX
# 8. Verify: kubectl get nodes, cilium status, kubectl get storageclass

What Can Go Wrong

Node Stays NotReady After Agent Joins

Cilium didn't come up on the agent node. Check:

bashkubectl -n kube-system logs -l app.kubernetes.io/name=cilium --tail=50
kubectl describe node vm2 | grep -A10 Conditions

Common cause: Cilium on the server node wasn't fully Ready when the agent joined. Wait for Cilium to be fully running on the server before running the k3s-agent install.

Cilium L2: IP Assigned but Unreachable from LAN

bash# Check that L2 policy is applied
kubectl get ciliuml2announcementpolicies

# Check active announcements
cilium l2announce list

# Check rp_filter — must be 0
sysctl net.ipv4.conf.eth0.rp_filter

If rp_filter=1 — response packets are dropped. Fix:

bashsysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.default.rp_filter=0

Persist in /etc/sysctl.d/99-kubernetes.conf so it survives reboots.

APISIX Not Getting IP from Cilium

bashkubectl -n apisix get svc apisix-gateway
# EXTERNAL-IP: <pending>

kubectl -n apisix describe svc apisix-gateway

# Check if pool is exhausted
kubectl get ciliumloadbalancerippool default-pool -o yaml | grep -A5 status

Common cause: pool exhausted or the pool CIDR is not in the same subnet as the nodes.

Longhorn Replica Not Creating on VM2

bashkubectl -n longhorn-system get volume
# STATE: degraded

kubectl -n longhorn-system get node.longhorn.io vm2 -o yaml | grep -A5 conditions

systemctl status iscsid
df -h /var/lib/longhorn

Causes: iscsid not running, insufficient disk space, or node marked allowScheduling: false in Longhorn.

Pod Hangs When Moving to Another Node (local-path)

Expected behavior — the PV is bound to the original node via node affinity. Migrate the PVC to storageClassName: longhorn.

bash# Inspect the node affinity on an existing PV
kubectl get pv <pv-name> -o jsonpath='{.spec.nodeAffinity}'

Summary

k3s server starts with --flannel-backend=none --disable-kube-proxy --disable=traefik,servicelb
Cilium must be installed before joining the agent, otherwise the agent hangs in NotReady
k8sServiceHost in Cilium must be the real node IP, not 127.0.0.1
rp_filter=0 is mandatory for L2 Announcements — without it, LB response packets are silently dropped
Cilium L2 Announcements replaces MetalLB when Cilium is the CNI — no extra Helm chart needed
APISIX gets a LoadBalancer IP from the Cilium pool automatically
local-path in multi-node breaks pods when they move to another node — use Longhorn instead
Longhorn with 2 replicas survives single-node loss without data loss

weblog