Home

Proxmox VE Reference Guide

Enterprise virtualization platform

Proxmox VE Clustering & High Availability

Proxmox VE clustering enables you to join multiple physical servers into a single logical unit with centralized management, automatic failover, and zero-downtime maintenance capabilities. The integrated HA Manager ensures critical services remain available even when hardware fails.

4-Node Proxmox Cluster with Shared Storage

Proxmox Cluster - Centralized Management Corosync Communication, pmxcfs Config Sync, Single Web Interface pve-node1 192.168.1.10 • Quorum member • 4 VMs running • 3 CTs running • HA services: 2 Status: Online pve-node2 192.168.1.11 • Quorum member • 3 VMs running • 2 CTs running • HA services: 1 Status: Online pve-node3 192.168.1.12 • Quorum member • 5 VMs running • 4 CTs running • HA services: 3 Status: Online pve-node4 192.168.1.13 • Quorum member • 2 VMs running • 1 CT running • HA services: 1 Status: Online Shared Storage Cluster Ceph RBD / NFS / iSCSI SAN / ZFS Replication • VM disk images accessible from all nodes • Enables live migration and instant failover • No data copy needed for VM movement Cluster Network (Corosync) Migration Network (Optional)

Cluster Features

Centralized Management

Single web interface to manage all nodes and resources in the cluster.

  • Unified dashboard
  • Cross-node operations
  • Synchronized configuration
  • Cluster-wide monitoring

High Availability (HA)

Automatic restart of VMs on healthy nodes when a host fails.

  • Watchdog-based fencing
  • Automatic recovery
  • Priority-based restart
  • Service monitoring

Live Migration

Move running VMs between hosts with zero downtime for maintenance.

  • Online migration (VMs)
  • Offline migration (VMs/CTs)
  • Shared or local storage
  • Automatic or manual

Quorum-Based

Voting system prevents split-brain scenarios in network partitions.

  • Majority voting
  • External QDevice support
  • Auto-fencing
  • Safe failover

Setting Up a Cluster

Prerequisites

Creating a Cluster

On the first node, create the cluster
pvecm create my-cluster
Check cluster status
pvecm status

Adding Nodes

On additional nodes, join the cluster
pvecm add 192.168.1.10
View cluster nodes
pvecm nodes

Removing Nodes

Remove a node from cluster
pvecm delnode node-name

High Availability Manager

The HA Manager monitors services and automatically restarts them on other nodes if their host fails. It uses a priority system to determine which node should run which services.

Configuring HA Services

Add VM to HA (via Web GUI or CLI)
ha-manager add vm:100 --state started --max_restart 3 --max_relocate 3
Remove VM from HA
ha-manager remove vm:100
Check HA status
ha-manager status
View HA configuration
ha-manager config

HA Groups

HA groups allow you to restrict which nodes can run specific HA services, useful for licensing constraints or hardware requirements.

Create HA group
ha-manager groupadd production-nodes -nodes "pve-node1,pve-node2" -nofailback 0
Assign VM to HA group
ha-manager add vm:100 --group production-nodes

Live Migration

Migration Type Downtime Requirements
Online (Live) - Shared Storage None (seconds) Shared storage, network connectivity
Online (Live) - Local Storage Very brief High bandwidth network for storage sync
Offline VM shutdown time Target node has capacity

Migration Commands

Migrate VM online (live migration)
qm migrate 100 pve-node2 --online
Migrate VM offline
qm migrate 100 pve-node2
Migrate container
pct migrate 101 pve-node2
Migrate with bandwidth limit
qm migrate 100 pve-node2 --online --bwlimit 100

Quorum & Fencing

Understanding Quorum

Quorum ensures that only one partition of a split cluster can make changes, preventing data corruption. A cluster needs a majority of nodes (n/2 + 1) to have quorum.

  • 3-node cluster: Needs 2 nodes for quorum (can lose 1 node)
  • 4-node cluster: Needs 3 nodes for quorum (can lose 1 node)
  • 5-node cluster: Needs 3 nodes for quorum (can lose 2 nodes)

Two-Node Clusters

Two-node clusters are special cases. Use a QDevice (external vote provider) or adjust expected votes:

  • QDevice: External system providing third vote (recommended)
  • Expected votes: Temporary adjustment for maintenance (use with caution)
Temporarily set expected votes (emergency only)
pvecm expected 1

Fencing

Fencing ensures that a failed node is truly offline before starting its HA services elsewhere. Proxmox uses watchdog-based fencing by default.

Storage for Clusters

Shared Storage Options

  • Ceph: Hyper-converged storage integrated with Proxmox, best for clusters
  • NFS: Simple to setup, good performance, single point of failure without HA NFS
  • iSCSI: Block-level storage, often backed by commercial SANs
  • GlusterFS: Distributed filesystem, redundant
  • ZFS Replication: Block-level replication between nodes (not truly shared)

Best Practices

Troubleshooting

Check Corosync status
systemctl status corosync
View cluster ring status
corosync-cfgtool -s
Check cluster quorum
pvecm status
View HA manager logs
journalctl -u pve-ha-lrm -f

Proxmox VE clustering transforms multiple servers into a resilient, enterprise-grade infrastructure platform with automatic failover, centralized management, and zero-downtime maintenance capabilities.