Cozystack 1.4: New Dashboard UI, Persistent Tenant Workers, Backup Strategies, and Fractional GPU Sharing

Cozystack v1.4.0 release banner

Cozystack v1.4.0 is now available. The release was published on May 19, 2026, and rolls up every fix shipped in the v1.3.1 to v1.3.3 patch line.

This cycle is focused on the operational experience of running Cozystack as a production platform: a faster dashboard architecture, more durable tenant Kubernetes workers, clearer resource sizing, backup workflows for managed applications, better GPU utilization, safer ingress publishing, and fewer race conditions during first installs and upgrades.

Main highlights

New schema-driven dashboard UI

Cozystack 1.4 ships a rewritten dashboard from the cozystack/cozystack-ui project. The old openapi-ui plus BFF stack has been replaced by a React 19 and TypeScript frontend that talks directly to the Kubernetes API.

The new architecture removes an extra process and proxy layer while keeping the dashboard schema-driven. It also improves several day-to-day workflows:

  • VNC access for virtual machines now uses dynamic WebSocket URLs instead of deployment-specific localhost assumptions.
  • The dashboard can read ApplicationDefinition resources for the application catalog and marketplace.
  • Operators can inject runtime branding through a ConfigMap, including logos, names, and brand colors, without rebuilding the image.
  • Existing /openapi-ui/* bookmarks are redirected to the new console.
  • The package has been renamed consistently to cozy-dashboard.
Cozystack 1.4 dashboard — IaaS marketplace with Bucket, Kubernetes, VirtualPrivateCloud, VMDisk and VMInstance services

The new dashboard exposes the IaaS marketplace driven by ApplicationDefinition resources.

Cozystack 1.4 dashboard — PaaS marketplace with managed PostgreSQL, MariaDB, ClickHouse, Kafka, Redis, OpenSearch, NATS, Harbor, Keycloak and more

The PaaS catalog covers managed databases, messaging, object storage, secrets, search, and inference services.

Cozystack 1.4 dashboard — Deploy new Kubernetes form with cluster addons (cert-manager, Cilium, Gateway API, ingress-nginx, GPU Operator)

Deploying a managed Kubernetes cluster uses the same schema-driven form, with cluster addons exposed declaratively.

Cozystack 1.4 dashboard — Deploy new httpcache form with size, storageClass, endpoints, HAProxy replicas, and resource parameters

A managed HTTP cache deployment, with PVC sizing, storage class, endpoints, and resource parameters all generated from the application schema.

Documentation:

Persistent worker-node storage for tenant Kubernetes

Tenant Kubernetes worker VMs now use PVC-backed persistent disks through KubeVirt dataVolumeTemplates. Previously, workers used ephemeral emptyDisk storage, which meant kubelet certificates, kubeconfig, and containerd state were lost after a VM reboot. A rebooted worker could lose its identity and require manual recovery.

In v1.4, worker state survives VM restarts. The NodeGroup field ephemeralStorage is renamed to diskSize, and a new per-nodeGroup storageClass option lets operators control where worker disks are provisioned. Migration 39 rewrites legacy values automatically during upgrade.

Existing tenant clusters will roll worker nodes once because the KubeVirt machine template changes. Operators should plan capacity for the rollout and choose the storage class intentionally. For many worker-node disk scenarios, the local StorageClass is recommended because worker disks now survive restarts and do not need DRBD replication semantics.

Documentation: Tenant Kubernetes configuration.

Instance-type resource presets

Resource presets now follow a cloud-style <series>.<size> taxonomy. The new model covers five CPU-to-memory series:

  • t1 for tiny and low-memory workloads.
  • c1 for compute-balanced workloads.
  • s1 for standard services such as proxies and caches.
  • u1 for universal workloads such as databases and messaging.
  • m1 for memory-heavy workloads such as search and analytics.

Each series includes eight sizes from nano to 4xlarge, giving operators and tenants 40 presets in total.

The previous flat names such as small, medium, and large are still accepted as deprecated aliases. Existing deployments keep the same CPU and memory values, while Migration 39 rewrites stored values to the new names. The Cozystack API now emits deprecation warnings when app CRs still use legacy preset names.

Documentation: Resource presets.

Declarative backup strategies for managed applications

The backup strategy controller now supports PostgreSQL, MariaDB, ClickHouse, and FoundationDB. Tenants can define a strategy together with BackupClass, Plan, BackupJob, and RestoreJob resources, while the controller composes the backend-specific objects for each managed service.

The new strategies support scheduled backups, ad-hoc snapshots, in-place restores, and restore-to-copy workflows against S3-compatible object storage. Credentials are referenced through Kubernetes Secrets instead of being stored inline, and controller RBAC is constrained so it can only access explicitly referenced secrets.

This extends the existing backup flows for VMInstance and VMDisk and moves Cozystack closer to full backup coverage across the managed application catalog.

Documentation:

HAMi-based fractional GPU sharing

Cozystack 1.4 adds hami as an optional system package. HAMi v2.8.1, a CNCF Sandbox project, provides fractional GPU sharing for tenant Kubernetes clusters.

With HAMi enabled, tenant workloads can request resources such as nvidia.com/gpu, nvidia.com/gpumem, and nvidia.com/gpucores, allowing multiple pods to share one physical NVIDIA GPU with explicit memory and compute slicing. The integration includes the device plugin, scheduler extender, mutating webhook, and RuntimeClass. It is exposed through an opt-in hami.enabled toggle and depends on the NVIDIA GPU Operator.

There is one important compatibility note: HAMi compute isolation depends on container images with glibc older than 2.34. Memory enforcement works broadly, but Alpine and musl-based images are not supported for HAMi-core compute isolation.

Documentation: GPU sharing with HAMi.

One switch for PROXY protocol and hairpin NAT

The new publishing.proxyProtocol: true option enables PROXY protocol on the host ingress-nginx and deploys Ouroboros to solve the related hairpin-NAT problem.

When PROXY protocol is enabled, in-cluster traffic to the cluster’s own public hostnames can otherwise reach ingress-nginx without the required PROXY header. Ouroboros fixes that path through CoreDNS rewrite snippets. Cozystack exposes it as both a host-level system package and a per-tenant addon through addons.ouroboros.enabled.

The default behavior is unchanged. Clusters that do not enable PROXY protocol do not receive new resources.

Documentation: PROXY protocol and hairpin NAT.

Better HelmRelease behavior and tenant bootstrap reliability

The Cozystack operator now exposes HelmRelease generation knobs as operator flags and chart values, including interval, retry interval, install timeout, upgrade timeout, and max history.

The retry strategy now uses RetryOnFailure, which avoids uninstall-and-reinstall loops when a first install is slow. Applications can also set a per-Application install and upgrade timeout through the release.cozystack.io/helm-install-timeout annotation. Tenant Kubernetes uses this to give Kamaji enough time during cold bootstrap, fixing the recurring wait hr/tenant-kubernetes timeout failure mode.

Documentation:

Kubelet reservations for worker nodes

Tenant Kubernetes worker nodes now receive automatically computed kubelet reservations for CPU and memory. This keeps kubelet itself from being targeted under memory pressure and makes scheduler and autoscaler decisions more accurate.

Cluster-autoscaler annotations now report allocatable CPU and memory instead of raw totals, so autoscaling decisions match what Kubernetes can actually schedule.

Documentation: Tenant Kubernetes operations.

Also in v1.4.0

  • PostgreSQL parameters are now typed and protected by a denylist for dangerous values such as archive_command, restore_command, ssl_passphrase_command, dynamic_library_path, and *_preload_libraries.
  • Keycloak gains extraEnv and user profile customization support.
  • The etcd application exposes S3 backup schedules through the updated etcd-operator.
  • Per-package upgradeCRDs policy is now configurable.
  • cozyreport now collects Flux, cert-manager, host context, application resources, and a top-level summary.txt.
  • SeaweedFS tenant ingress limits single PUT requests to 5 GB.
  • GPU observability dashboards and recording rules were added for Grafana and VictoriaMetrics.
  • VMInstance port filtering is fixed under the new cozy-proxy v0.3.0 mode.
  • LINSTOR CSI is updated with fixes for dual-attach and transient demotion errors.

Documentation:

Platform components

Cozystack 1.4 updates the platform base and several core packages:

  • Talos: v1.12.7 to v1.13.0
  • cert-manager: v1.19.3 to v1.20.2
  • Cilium: v1.19.1 to v1.19.3
  • NVIDIA GPU Operator: v25.3.0 to v26.3.1
  • etcd-operator: v0.4.2 to v0.4.3
  • KubeVirt: v1.6.3 to v1.8.2
  • cozy-proxy: v0.2.0 to v0.3.0
  • linstor-csi: v1.10.6
  • HAMi: v2.8.1
  • Ouroboros: v0.7.2

Documentation:

Upgrade notes

Most operators can upgrade to v1.4.0 without manual configuration changes. Cozystack keeps the same API surface for existing workloads, and in-platform migrations handle the main value rewrites.

There are several operational details to plan for:

  • Tenant Kubernetes workers will roll once. The ephemeralStorage to diskSize migration is automatic, but existing worker VMs are replaced one by one because the KubeVirt machine template changes.
  • KubeVirt VMs that were already running before the platform upgrade need a cold restart after upgrading. The KubeVirt jump from v1.6.3 to v1.8.2 crosses an upstream QEMU change, and live migration of pre-upgrade VMs can fail. New VMs created after the upgrade are unaffected.
  • Legacy resource preset names still work as deprecated aliases, but new deployments should use the <series>.<size> names.
  • PostgreSQL deployments that use denylisted parameters will fail to render until those parameters are removed.
  • cert-manager v1.20 changes the default container UID/GID to 65532. Operators with custom PodSecurityPolicy, imagePullSecrets, or filesystem-mounted certificates pinned to the previous UID should review their configuration.

Documentation:

Documentation worth knowing about

Thank you to all contributors

This release was shaped by the work of @androndo, @Arsolitt, @dislogical, @dvc, @IvanHunters, @kvaps, @lexfrei, @matthieu-robin, @mattia-eleuteri, @myasnikovdaniil, @sircthulhu, and @tym83.

A special welcome to first-time contributors @dvc and @dislogical. Thank you all.

Join the community