vmsingle: one VictoriaMetrics binary instead of the whole Prometheus stack

Published: 2026-06-09

This site runs on a single-node k0s cluster on a small VPS, and the VPS also has to fit the actual websites, a mail server, and half a dozen proxies. kube-prometheus-stack wants more memory than all of those combined. The replacement is vmsingle — VictoriaMetrics single-node — which scrapes, stores, and queries metrics in one binary with a 512 MiB memory limit. No Prometheus, no operator, no CRDs.

Why not kube-prometheus-stack

On a multi-node work cluster, kube-prometheus-stack is the obvious choice: operator, ServiceMonitors, HA pairs. On a 2-CPU VPS it's a different story:

Prometheus alone idles around 400–700 MiB with a modest target list
The operator, admission webhooks, and CRDs add pods that do nothing useful on one node
ServiceMonitor indirection is pointless when you can list every target by hand

vmsingle has a built-in scraper that accepts standard Prometheus scrape_configs verbatim. One pod replaces Prometheus + operator, speaks PromQL (plus MetricsQL extensions), and ships VMUI for ad-hoc queries.

The Helm release

The chart is vm/victoria-metrics-single, deployed from a values file:

yamlserver:
  fullnameOverride: "vmsingle-server"
  retentionPeriod: "90d"
  persistentVolume:
    enabled: true
    existingClaim: vmsingle-data
  resources:
    limits:
      cpu: 500m
      memory: 512Mi
    requests:
      cpu: 50m
      memory: 128Mi
  extraArgs:
    enableTCP6: "true"
    envflag.enable: "true"
    loggerFormat: json
    vmalert.proxyURL: http://vmalert.monitoring.svc.cluster.local:8080
    memory.allowedPercent: "20"

Two flags matter here:

memory.allowedPercent: 20 — VictoriaMetrics sizes its internal caches as a percentage of available memory. The default (60%) is calculated from the cgroup limit, and with other things running on the node it's safer to keep the caches small. Query performance on a dataset this size doesn't suffer.
vmalert.proxyURL — VMUI proxies the alerting tabs to vmalert, so firing rules are visible in the same UI.

90-day retention for a homelab is generous and still fits in a few GiB — VictoriaMetrics compresses to well under 1 byte per sample on slow-moving series.

Storage: hostPath PV with Retain

There is no storage class on this cluster, so the PV is declared by hand and pinned to a directory on the node:

yamlapiVersion: v1
kind: PersistentVolume
metadata:
  name: vmsingle-data
spec:
  capacity:
    storage: 20Gi
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /var/lib/victoria-metrics
    type: DirectoryOrCreate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vmsingle-data
  namespace: monitoring
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 20Gi
  volumeName: vmsingle-data
  storageClassName: ""

storageClassName: "" is load-bearing: an empty string disables dynamic provisioning, otherwise the PVC sits in Pending waiting for a default provisioner that doesn't exist. Retain means 90 days of metrics survive a helm uninstall.

The scrape config

scrape.enabled: true turns on the embedded scraper. The config is plain Prometheus syntax:

yamlscrape:
  enabled: true
  config:
    global:
      scrape_interval: 30s
      scrape_timeout: 10s
    scrape_configs:
    - job_name: node
      static_configs:
      - labels:
          instance: k0s-node
        targets:
        - node-exporter-prometheus-node-exporter.monitoring.svc.cluster.local:9100

    - job_name: kube-state-metrics
      static_configs:
      - targets:
        - kube-state-metrics.monitoring.svc.cluster.local:8080

    - job_name: traefik
      static_configs:
      - labels:
          instance: traefik
        targets:
        - traefik.traefik.svc.cluster.local:9101
      metric_relabel_configs:
      - source_labels: [service]
        regex: '(.+)-[0-9a-f]{16}@kubernetescrd'
        replacement: '$1'
        target_label: service

Everything is static_configs — on a single node with stable Service names, Kubernetes service discovery adds nothing but moving parts. The full target list: node-exporter, kube-state-metrics, vmsingle itself, vmalert, Traefik, two proxy exporters, three blackbox-exporter jobs (HTTP uptime for nine sites, TCP checks for proxy ports, external DNS connectivity), kubelet cAdvisor, and a textfile collector job for WireGuard peers.

The Traefik relabel rule deserves a note: Traefik names services from IngressRoute CRDs as namespace-name-<16 hex chars>@kubernetescrd. The hash changes when the route changes, which breaks any dashboard grouped by service. The relabel strips it at scrape time.

Scraping cAdvisor through the kubelet

Container-level CPU and memory metrics come from the kubelet's embedded cAdvisor, which requires TLS and a token:

yaml- job_name: cadvisor
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  metrics_path: /metrics/cadvisor
  honor_labels: true
  static_configs:
  - targets:
    - 91.184.248.13:10250
  metric_relabel_configs:
  - source_labels: [container]
    regex: ".+"
    action: keep
  - source_labels: [container]
    regex: "POD"
    action: drop

The ServiceAccount token only works if the chart's ServiceAccount is allowed to hit the kubelet API. That's a small ClusterRole:

yamlapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vmsingle-kubelet-metrics
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/metrics"]
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor", "/metrics/resource"]
    verbs: ["get"]

bound to the vmsingle-server ServiceAccount. The two metric_relabel_configs drop the per-pod pause-container series (container="POD") and the cgroup-aggregate series with an empty container label — without them every pod produces two or three phantom duplicates and dashboards double-count memory.

A stable Service name

One non-obvious addition — a plain ClusterIP Service in front of vmsingle:

yamlapiVersion: v1
kind: Service
metadata:
  name: vmsingle-stable
  namespace: monitoring
spec:
  selector:
    app.kubernetes.io/instance: vmsingle
    app.kubernetes.io/name: victoria-metrics-single
  ports:
    - port: 8428
      targetPort: 8428
  type: ClusterIP

The chart creates a headless Service, and headless DNS resolves directly to the pod IP. During CoreDNS restarts (which happen on a single node every time the node reboots) clients caching a stale pod IP get connection refused. Grafana and vmalert point at vmsingle-stable instead — a ClusterIP survives pod churn.

What can go wrong

OOMKilled under query load. The cgroup limit is 512 MiB and heavy queries over long ranges can spike usage. memory.allowedPercent: 20 keeps the caches small; if it still OOMs, raise the limit before touching the flag — caches below ~100 MiB make everything slow.

PVC stuck in Pending. Almost always the storageClassName: "" / volumeName pair missing on a hand-made PV+PVC. Check kubectl -n monitoring describe pvc vmsingle-data — "waiting for first consumer" is fine, "no persistent volumes available" means the binding is wrong.

cAdvisor scrape returns 401. The ClusterRoleBinding points at the ServiceAccount name the chart actually generated — fullnameOverride changes it. Verify with kubectl -n monitoring get sa and re-check the binding subject.

Dashboards double-count container memory. The container="POD" and empty-container series got through. Confirm the metric_relabel_configs are present in the running config: curl -s localhost:8428/config | grep -A5 cadvisor after a port-forward.

Summary

One vmsingle pod replaces Prometheus, the operator, and all its CRDs — 128 MiB requested, 512 MiB cap
The embedded scraper takes standard Prometheus scrape_configs; static targets beat service discovery on a single node
cAdvisor needs RBAC for the kubelet API plus relabel rules to drop pause-container duplicates
hostPath PV with Retain and explicit volumeName binding — no storage class involved
A separate ClusterIP Service (vmsingle-stable) avoids stale headless DNS after CoreDNS restarts
90-day retention costs single-digit GiB at this scale

weblog