vmsingle: one VictoriaMetrics binary instead of the whole Prometheus stack
Published: 2026-06-09
This site runs on a single-node k0s cluster on a small VPS, and the VPS also has to fit the actual websites, a mail server, and half a dozen proxies. kube-prometheus-stack wants more memory than all of those combined. The replacement is vmsingle — VictoriaMetrics single-node — which scrapes, stores, and queries metrics in one binary with a 512 MiB memory limit. No Prometheus, no operator, no CRDs.
Why not kube-prometheus-stack
On a multi-node work cluster, kube-prometheus-stack is the obvious choice: operator, ServiceMonitors, HA pairs. On a 2-CPU VPS it's a different story:
- Prometheus alone idles around 400–700 MiB with a modest target list
- The operator, admission webhooks, and CRDs add pods that do nothing useful on one node
- ServiceMonitor indirection is pointless when you can list every target by hand
vmsingle has a built-in scraper that accepts standard Prometheus scrape_configs verbatim. One pod replaces Prometheus + operator, speaks PromQL (plus MetricsQL extensions), and ships VMUI for ad-hoc queries.
The Helm release
The chart is vm/victoria-metrics-single, deployed from a values file:
yamlserver:
fullnameOverride: "vmsingle-server"
retentionPeriod: "90d"
persistentVolume:
enabled: true
existingClaim: vmsingle-data
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
extraArgs:
enableTCP6: "true"
envflag.enable: "true"
loggerFormat: json
vmalert.proxyURL: http://vmalert.monitoring.svc.cluster.local:8080
memory.allowedPercent: "20"
Two flags matter here:
memory.allowedPercent: 20— VictoriaMetrics sizes its internal caches as a percentage of available memory. The default (60%) is calculated from the cgroup limit, and with other things running on the node it's safer to keep the caches small. Query performance on a dataset this size doesn't suffer.vmalert.proxyURL— VMUI proxies the alerting tabs to vmalert, so firing rules are visible in the same UI.
90-day retention for a homelab is generous and still fits in a few GiB — VictoriaMetrics compresses to well under 1 byte per sample on slow-moving series.
Storage: hostPath PV with Retain
There is no storage class on this cluster, so the PV is declared by hand and pinned to a directory on the node:
yamlapiVersion: v1
kind: PersistentVolume
metadata:
name: vmsingle-data
spec:
capacity:
storage: 20Gi
accessModes: [ReadWriteOnce]
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /var/lib/victoria-metrics
type: DirectoryOrCreate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vmsingle-data
namespace: monitoring
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 20Gi
volumeName: vmsingle-data
storageClassName: ""
storageClassName: "" is load-bearing: an empty string disables dynamic provisioning, otherwise the PVC sits in Pending waiting for a default provisioner that doesn't exist. Retain means 90 days of metrics survive a helm uninstall.
The scrape config
scrape.enabled: true turns on the embedded scraper. The config is plain Prometheus syntax:
yamlscrape:
enabled: true
config:
global:
scrape_interval: 30s
scrape_timeout: 10s
scrape_configs:
- job_name: node
static_configs:
- labels:
instance: k0s-node
targets:
- node-exporter-prometheus-node-exporter.monitoring.svc.cluster.local:9100
- job_name: kube-state-metrics
static_configs:
- targets:
- kube-state-metrics.monitoring.svc.cluster.local:8080
- job_name: traefik
static_configs:
- labels:
instance: traefik
targets:
- traefik.traefik.svc.cluster.local:9101
metric_relabel_configs:
- source_labels: [service]
regex: '(.+)-[0-9a-f]{16}@kubernetescrd'
replacement: '$1'
target_label: service
Everything is static_configs — on a single node with stable Service names, Kubernetes service discovery adds nothing but moving parts. The full target list: node-exporter, kube-state-metrics, vmsingle itself, vmalert, Traefik, two proxy exporters, three blackbox-exporter jobs (HTTP uptime for nine sites, TCP checks for proxy ports, external DNS connectivity), kubelet cAdvisor, and a textfile collector job for WireGuard peers.
The Traefik relabel rule deserves a note: Traefik names services from IngressRoute CRDs as namespace-name-<16 hex chars>@kubernetescrd. The hash changes when the route changes, which breaks any dashboard grouped by service. The relabel strips it at scrape time.
Scraping cAdvisor through the kubelet
Container-level CPU and memory metrics come from the kubelet's embedded cAdvisor, which requires TLS and a token:
yaml- job_name: cadvisor
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
metrics_path: /metrics/cadvisor
honor_labels: true
static_configs:
- targets:
- 91.184.248.13:10250
metric_relabel_configs:
- source_labels: [container]
regex: ".+"
action: keep
- source_labels: [container]
regex: "POD"
action: drop
The ServiceAccount token only works if the chart's ServiceAccount is allowed to hit the kubelet API. That's a small ClusterRole:
yamlapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vmsingle-kubelet-metrics
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/metrics"]
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics", "/metrics/cadvisor", "/metrics/resource"]
verbs: ["get"]
bound to the vmsingle-server ServiceAccount. The two metric_relabel_configs drop the per-pod pause-container series (container="POD") and the cgroup-aggregate series with an empty container label — without them every pod produces two or three phantom duplicates and dashboards double-count memory.
A stable Service name
One non-obvious addition — a plain ClusterIP Service in front of vmsingle:
yamlapiVersion: v1
kind: Service
metadata:
name: vmsingle-stable
namespace: monitoring
spec:
selector:
app.kubernetes.io/instance: vmsingle
app.kubernetes.io/name: victoria-metrics-single
ports:
- port: 8428
targetPort: 8428
type: ClusterIP
The chart creates a headless Service, and headless DNS resolves directly to the pod IP. During CoreDNS restarts (which happen on a single node every time the node reboots) clients caching a stale pod IP get connection refused. Grafana and vmalert point at vmsingle-stable instead — a ClusterIP survives pod churn.
What can go wrong
OOMKilled under query load. The cgroup limit is 512 MiB and heavy queries over long ranges can spike usage. memory.allowedPercent: 20 keeps the caches small; if it still OOMs, raise the limit before touching the flag — caches below ~100 MiB make everything slow.
PVC stuck in Pending. Almost always the storageClassName: "" / volumeName pair missing on a hand-made PV+PVC. Check kubectl -n monitoring describe pvc vmsingle-data — "waiting for first consumer" is fine, "no persistent volumes available" means the binding is wrong.
cAdvisor scrape returns 401. The ClusterRoleBinding points at the ServiceAccount name the chart actually generated — fullnameOverride changes it. Verify with kubectl -n monitoring get sa and re-check the binding subject.
Dashboards double-count container memory. The container="POD" and empty-container series got through. Confirm the metric_relabel_configs are present in the running config: curl -s localhost:8428/config | grep -A5 cadvisor after a port-forward.
Summary
- One
vmsinglepod replaces Prometheus, the operator, and all its CRDs — 128 MiB requested, 512 MiB cap - The embedded scraper takes standard Prometheus
scrape_configs; static targets beat service discovery on a single node - cAdvisor needs RBAC for the kubelet API plus relabel rules to drop pause-container duplicates
- hostPath PV with
Retainand explicitvolumeNamebinding — no storage class involved - A separate ClusterIP Service (
vmsingle-stable) avoids stale headless DNS after CoreDNS restarts - 90-day retention costs single-digit GiB at this scale