vmsingle: one VictoriaMetrics binary instead of the whole Prometheus stack

Published: 2026-06-09

This site runs on a single-node k0s cluster on a small VPS, and the VPS also has to fit the actual websites, a mail server, and half a dozen proxies. kube-prometheus-stack wants more memory than all of those combined. The replacement is vmsingle — VictoriaMetrics single-node — which scrapes, stores, and queries metrics in one binary with a 512 MiB memory limit. No Prometheus, no operator, no CRDs.


Why not kube-prometheus-stack

On a multi-node work cluster, kube-prometheus-stack is the obvious choice: operator, ServiceMonitors, HA pairs. On a 2-CPU VPS it's a different story:

  • Prometheus alone idles around 400–700 MiB with a modest target list
  • The operator, admission webhooks, and CRDs add pods that do nothing useful on one node
  • ServiceMonitor indirection is pointless when you can list every target by hand

vmsingle has a built-in scraper that accepts standard Prometheus scrape_configs verbatim. One pod replaces Prometheus + operator, speaks PromQL (plus MetricsQL extensions), and ships VMUI for ad-hoc queries.

The Helm release

The chart is vm/victoria-metrics-single, deployed from a values file:

yamlserver:
  fullnameOverride: "vmsingle-server"
  retentionPeriod: "90d"
  persistentVolume:
    enabled: true
    existingClaim: vmsingle-data
  resources:
    limits:
      cpu: 500m
      memory: 512Mi
    requests:
      cpu: 50m
      memory: 128Mi
  extraArgs:
    enableTCP6: "true"
    envflag.enable: "true"
    loggerFormat: json
    vmalert.proxyURL: http://vmalert.monitoring.svc.cluster.local:8080
    memory.allowedPercent: "20"

Two flags matter here:

  • memory.allowedPercent: 20 — VictoriaMetrics sizes its internal caches as a percentage of available memory. The default (60%) is calculated from the cgroup limit, and with other things running on the node it's safer to keep the caches small. Query performance on a dataset this size doesn't suffer.
  • vmalert.proxyURL — VMUI proxies the alerting tabs to vmalert, so firing rules are visible in the same UI.

90-day retention for a homelab is generous and still fits in a few GiB — VictoriaMetrics compresses to well under 1 byte per sample on slow-moving series.

Storage: hostPath PV with Retain

There is no storage class on this cluster, so the PV is declared by hand and pinned to a directory on the node:

yamlapiVersion: v1
kind: PersistentVolume
metadata:
  name: vmsingle-data
spec:
  capacity:
    storage: 20Gi
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /var/lib/victoria-metrics
    type: DirectoryOrCreate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vmsingle-data
  namespace: monitoring
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 20Gi
  volumeName: vmsingle-data
  storageClassName: ""

storageClassName: "" is load-bearing: an empty string disables dynamic provisioning, otherwise the PVC sits in Pending waiting for a default provisioner that doesn't exist. Retain means 90 days of metrics survive a helm uninstall.

The scrape config

scrape.enabled: true turns on the embedded scraper. The config is plain Prometheus syntax:

yamlscrape:
  enabled: true
  config:
    global:
      scrape_interval: 30s
      scrape_timeout: 10s
    scrape_configs:
    - job_name: node
      static_configs:
      - labels:
          instance: k0s-node
        targets:
        - node-exporter-prometheus-node-exporter.monitoring.svc.cluster.local:9100

    - job_name: kube-state-metrics
      static_configs:
      - targets:
        - kube-state-metrics.monitoring.svc.cluster.local:8080

    - job_name: traefik
      static_configs:
      - labels:
          instance: traefik
        targets:
        - traefik.traefik.svc.cluster.local:9101
      metric_relabel_configs:
      - source_labels: [service]
        regex: '(.+)-[0-9a-f]{16}@kubernetescrd'
        replacement: '$1'
        target_label: service

Everything is static_configs — on a single node with stable Service names, Kubernetes service discovery adds nothing but moving parts. The full target list: node-exporter, kube-state-metrics, vmsingle itself, vmalert, Traefik, two proxy exporters, three blackbox-exporter jobs (HTTP uptime for nine sites, TCP checks for proxy ports, external DNS connectivity), kubelet cAdvisor, and a textfile collector job for WireGuard peers.

The Traefik relabel rule deserves a note: Traefik names services from IngressRoute CRDs as namespace-name-<16 hex chars>@kubernetescrd. The hash changes when the route changes, which breaks any dashboard grouped by service. The relabel strips it at scrape time.

Scraping cAdvisor through the kubelet

Container-level CPU and memory metrics come from the kubelet's embedded cAdvisor, which requires TLS and a token:

yaml- job_name: cadvisor
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  metrics_path: /metrics/cadvisor
  honor_labels: true
  static_configs:
  - targets:
    - 91.184.248.13:10250
  metric_relabel_configs:
  - source_labels: [container]
    regex: ".+"
    action: keep
  - source_labels: [container]
    regex: "POD"
    action: drop

The ServiceAccount token only works if the chart's ServiceAccount is allowed to hit the kubelet API. That's a small ClusterRole:

yamlapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vmsingle-kubelet-metrics
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/metrics"]
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor", "/metrics/resource"]
    verbs: ["get"]

bound to the vmsingle-server ServiceAccount. The two metric_relabel_configs drop the per-pod pause-container series (container="POD") and the cgroup-aggregate series with an empty container label — without them every pod produces two or three phantom duplicates and dashboards double-count memory.

A stable Service name

One non-obvious addition — a plain ClusterIP Service in front of vmsingle:

yamlapiVersion: v1
kind: Service
metadata:
  name: vmsingle-stable
  namespace: monitoring
spec:
  selector:
    app.kubernetes.io/instance: vmsingle
    app.kubernetes.io/name: victoria-metrics-single
  ports:
    - port: 8428
      targetPort: 8428
  type: ClusterIP

The chart creates a headless Service, and headless DNS resolves directly to the pod IP. During CoreDNS restarts (which happen on a single node every time the node reboots) clients caching a stale pod IP get connection refused. Grafana and vmalert point at vmsingle-stable instead — a ClusterIP survives pod churn.

What can go wrong

OOMKilled under query load. The cgroup limit is 512 MiB and heavy queries over long ranges can spike usage. memory.allowedPercent: 20 keeps the caches small; if it still OOMs, raise the limit before touching the flag — caches below ~100 MiB make everything slow.

PVC stuck in Pending. Almost always the storageClassName: "" / volumeName pair missing on a hand-made PV+PVC. Check kubectl -n monitoring describe pvc vmsingle-data — "waiting for first consumer" is fine, "no persistent volumes available" means the binding is wrong.

cAdvisor scrape returns 401. The ClusterRoleBinding points at the ServiceAccount name the chart actually generated — fullnameOverride changes it. Verify with kubectl -n monitoring get sa and re-check the binding subject.

Dashboards double-count container memory. The container="POD" and empty-container series got through. Confirm the metric_relabel_configs are present in the running config: curl -s localhost:8428/config | grep -A5 cadvisor after a port-forward.

Summary

  • One vmsingle pod replaces Prometheus, the operator, and all its CRDs — 128 MiB requested, 512 MiB cap
  • The embedded scraper takes standard Prometheus scrape_configs; static targets beat service discovery on a single node
  • cAdvisor needs RBAC for the kubelet API plus relabel rules to drop pause-container duplicates
  • hostPath PV with Retain and explicit volumeName binding — no storage class involved
  • A separate ClusterIP Service (vmsingle-stable) avoids stale headless DNS after CoreDNS restarts
  • 90-day retention costs single-digit GiB at this scale