PostgresSQL Clustering the hard way

From wiki.baghirzade.pro
Revision as of 12:08, 26 September 2025 by Sadmin (talk | contribs)
Jump to navigation Jump to search

Today is the day that you start running PostgresSQL in a cluster! In this tutorial we’ll be setting up a production ready Postgres cluster that’s highly available and fault tolerant using PostgreSQL, etcd, Patroni, HA Proxy, and keepalived. This resilient combination will ensure that you can always reach your database even when a node in the cluster goes down!


Nodes

# ha proxy
192.168.60.100 # haproxy-01
192.168.60.101 # haproxy-02
192.168.60.102 # haproxy-03

# postgres
192.168.60.103 # postgres-01
192.168.60.104 # postgres-02
192.168.60.105 # postgres-03

PostgreSQL

On postgres nodes install latest postgres.

Install updated postgres repositories.

sudo apt update
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh

We’ll configure postgres later. Stop and disable the service because Patroni will handle the lifecycle of postgres

sudo systemctl stop postgresql
sudo systemctl disable postgresql

etcd

Install

This should be installed on the postgres servers.

Make sure you have wget and curl

sudo apt update
sudo apt-get install -y wget curl

Find latest release. https://github.com/etcd-io/etcd/releases

wget https://github.com/etcd-io/etcd/releases/download/v3.5.17/etcd-v3.5.17-linux-amd64.tar.gz

Uncompress and rename.

tar xvf etcd-v3.5.17-linux-amd64.tar.gz
mv etcd-v3.5.17-linux-amd64 etcd

Move all binaries into /usr/local/bin/ for later use.

sudo mv etcd/etcd* /usr/local/bin/

Check etcd version

etcd --version etcdctl version

Let’s create a user for etcd service.

sudo useradd --system --home /var/lib/etcd --shell /bin/false etcd

Before configuring etcd, we need to repeat all of these steps for the other 2 nodes.

Let’s configure etcd.

Make dir and edit file.

sudo mkdir -p /etc/etcd
sudo mkdir -p /etc/etcd/ssl

Linux

sudo apt install openssl

Verify it’s installed and working

openssl version

Should see something like:

OpenSSL 3.4.0 22 Oct 2024 (Library: OpenSSL 3.4.0 22 Oct 2024)

Now let’s generate and configure certs

mkdir certs cd certs

Generate cert authority

openssl genrsa -out ca.key 2048 openssl req -x509 -new -nodes -key ca.key -subj "/CN=etcd-ca" -days 7300 -out ca.crt

Generate certificate each node. Note, pay attention to SANS, I am using IP, update with your IP and oh DNS/hostname. node1

# Generate a private key
openssl genrsa -out etcd-node1.key 2048

# Create temp file for config
cat > temp.cnf <<EOF
[ req ]
distinguished_name = req_distinguished_name
req_extensions = v3_req
[ req_distinguished_name ]
[ v3_req ]
subjectAltName = @alt_names
[ alt_names ]
IP.1 = 192.168.60.103
IP.2 = 127.0.0.1
EOF

# Create a csr
openssl req -new -key etcd-node1.key -out etcd-node1.csr \
  -subj "/C=US/ST=YourState/L=YourCity/O=YourOrganization/OU=YourUnit/CN=etcd-node1" \
  -config temp.cnf

# Sign the cert
openssl x509 -req -in etcd-node1.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out etcd-node1.crt -days 7300 -sha256 -extensions v3_req -extfile temp.cnf

# Verify the cert and be sure you see Subject Name Alternative

openssl x509 -in etcd-node1.crt -text -noout | grep -A1 "Subject Alternative Name"

# Remove temp file

rm temp.cnf

node 2

  1. Generate a private key

openssl genrsa -out etcd-node2.key 2048

  1. Create temp file for config

cat > temp.cnf <<EOF [ req ] distinguished_name = req_distinguished_name req_extensions = v3_req [ req_distinguished_name ] [ v3_req ] subjectAltName = @alt_names [ alt_names ] IP.1 = 192.168.60.104 IP.2 = 127.0.0.1 EOF

  1. Create a csr

openssl req -new -key etcd-node2.key -out etcd-node2.csr \

 -subj "/C=US/ST=YourState/L=YourCity/O=YourOrganization/OU=YourUnit/CN=etcd-node2" \
 -config temp.cnf
  1. Sign the cert

openssl x509 -req -in etcd-node2.csr -CA ca.crt -CAkey ca.key -CAcreateserial \

 -out etcd-node2.crt -days 7300 -sha256 -extensions v3_req -extfile temp.cnf
  1. Verify the cert and be sure you see Subject Name Alternative

openssl x509 -in etcd-node2.crt -text -noout | grep -A1 "Subject Alternative Name"

  1. Remove temp file

rm temp.cnf

node3

  1. Generate a private key

openssl genrsa -out etcd-node3.key 2048

  1. Create temp file for config

cat > temp.cnf <<EOF [ req ] distinguished_name = req_distinguished_name req_extensions = v3_req [ req_distinguished_name ] [ v3_req ] subjectAltName = @alt_names [ alt_names ] IP.1 = 192.168.60.105 IP.2 = 127.0.0.1 EOF

  1. Create a csr

openssl req -new -key etcd-node3.key -out etcd-node3.csr \

 -subj "/C=US/ST=YourState/L=YourCity/O=YourOrganization/OU=YourUnit/CN=etcd-node3" \
 -config temp.cnf
  1. Sign the cert

openssl x509 -req -in etcd-node3.csr -CA ca.crt -CAkey ca.key -CAcreateserial \

 -out etcd-node3.crt -days 7300 -sha256 -extensions v3_req -extfile temp.cnf
  1. Verify the cert and be sure you see Subject Name Alternative

openssl x509 -in etcd-node3.crt -text -noout | grep -A1 "Subject Alternative Name"

  1. Remove temp file

rm temp.cnf

List all files

ls

Should see:

ca.crt ca.key ca.srl etcd-node1.crt etcd-node1.csr etcd-node2.crt etcd-node2.csr etcd-node2.key etcd-node3.crt etcd-node3.csr etcd-node3.key

Secure copy (scp) the certs to each node:

scp ca.crt etcd-node1.crt etcd-node1.key [email protected]:/tmp/ scp ca.crt etcd-node2.crt etcd-node2.key [email protected]:/tmp/ scp ca.crt etcd-node3.crt etcd-node3.key [email protected]:/tmp/

Should see:

ssh into each node

We need to move certs from /tmp to ssl location (/etc/etcd/ssl)

sudo mkdir -p /etc/etcd/ssl
sudo mv /tmp/etcd-node*.crt /etc/etcd/ssl/
sudo mv /tmp/etcd-node*.key /etc/etcd/ssl/
sudo mv /tmp/ca.crt /etc/etcd/ssl/
sudo chown -R etcd:etcd /etc/etcd/
sudo chmod 600 /etc/etcd/ssl/etcd-node*.key
sudo chmod 644 /etc/etcd/ssl/etcd-node*.crt /etc/etcd/ssl/ca.crt

Configure Create our config

sudo nano /etc/etcd/etcd.env

node1

ETCD_NAME="postgresql-01" ETCD_DATA_DIR="/var/lib/etcd" ETCD_INITIAL_CLUSTER="postgresql-01=https://192.168.60.103:2380,postgresql-02=https://192.168.60.104:2380,postgresql-03=https://192.168.60.105:2380" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.60.103:2380" ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380" ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379" ETCD_ADVERTISE_CLIENT_URLS="https://192.168.60.103:2379" ETCD_CLIENT_CERT_AUTH="true" ETCD_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt" ETCD_CERT_FILE="/etc/etcd/ssl/etcd-node1.crt" ETCD_KEY_FILE="/etc/etcd/ssl/etcd-node1.key" ETCD_PEER_CLIENT_CERT_AUTH="true" ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt" ETCD_PEER_CERT_FILE="/etc/etcd/ssl/etcd-node1.crt" ETCD_PEER_KEY_FILE="/etc/etcd/ssl/etcd-node1.key"

node2

ETCD_NAME="postgresql-02" ETCD_DATA_DIR="/var/lib/etcd" ETCD_INITIAL_CLUSTER="postgresql-01=https://192.168.60.103:2380,postgresql-02=https://192.168.60.104:2380,postgresql-03=https://192.168.60.105:2380" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.60.104:2380" ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380" ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379" ETCD_ADVERTISE_CLIENT_URLS="https://192.168.60.104:2379" ETCD_CLIENT_CERT_AUTH="true" ETCD_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt" ETCD_CERT_FILE="/etc/etcd/ssl/etcd-node2.crt" ETCD_KEY_FILE="/etc/etcd/ssl/etcd-node2.key" ETCD_PEER_CLIENT_CERT_AUTH="true" ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt" ETCD_PEER_CERT_FILE="/etc/etcd/ssl/etcd-node2.crt" ETCD_PEER_KEY_FILE="/etc/etcd/ssl/etcd-node2.key"

node3

ETCD_NAME="postgresql-03" ETCD_DATA_DIR="/var/lib/etcd" ETCD_INITIAL_CLUSTER="postgresql-01=https://192.168.60.103:2380,postgresql-02=https://192.168.60.104:2380,postgresql-03=https://192.168.60.105:2380" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.60.105:2380" ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380" ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379" ETCD_ADVERTISE_CLIENT_URLS="https://192.168.60.105:2379" ETCD_CLIENT_CERT_AUTH="true" ETCD_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt" ETCD_CERT_FILE="/etc/etcd/ssl/etcd-node3.crt" ETCD_KEY_FILE="/etc/etcd/ssl/etcd-node3.key" ETCD_PEER_CLIENT_CERT_AUTH="true" ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt" ETCD_PEER_CERT_FILE="/etc/etcd/ssl/etcd-node3.crt" ETCD_PEER_KEY_FILE="/etc/etcd/ssl/etcd-node3.key"

Now let’s create a service for etcd on all 3 nodes

sudo nano /etc/systemd/system/etcd.service

Contents of service file, same for all 3 nodes

[Unit] Description=etcd key-value store Documentation=https://github.com/etcd-io/etcd After=network-online.target Wants=network-online.target

[Service] Type=notify WorkingDirectory=/var/lib/etcd EnvironmentFile=/etc/etcd/etcd.env ExecStart=/usr/local/bin/etcd Restart=always RestartSec=10s LimitNOFILE=40000 User=etcd Group=etcd

[Install] WantedBy=multi-user.target

We need to create a directory for etcd ETCD_DATA_DIR defined in service file.

sudo mkdir -p /var/lib/etcd sudo chown -R etcd:etcd /var/lib/etcd

Running etcd Reload daemon and enable the service

sudo systemctl daemon-reload
sudo systemctl enable etcd

Start etcd and check status this looks like it’s hanging on the first node but once you add another node it will complete

sudo systemctl start etcd
sudo systemctl status etcd

Should see something like:

● etcd.service - etcd key-value store
     Loaded: loaded (/etc/systemd/system/etcd.service; enabled; preset: enabled)
     Active: active (running) since Mon 2024-12-02 14:09:30 CST; 2s ago
       Docs: https://github.com/etcd-io/etcd
   Main PID: 7266 (etcd)
      Tasks: 9 (limit: 4612)
     Memory: 29.3M (peak: 30.0M)
        CPU: 246ms
     CGroup: /system.slice/etcd.service
             └─7266 /usr/local/bin/etcd

You can also check the logs for etcd by running:

journalctl -xeu etcd.service

Should see something like:

░░ Subject: A start job for unit etcd.service has finished successfully ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ A start job for unit etcd.service has finished successfully. ░░ ░░ The job identifier is 5757. Dec 02 14:09:30 postgres-01 etcd[7266]: {"level":"info","ts":"2024-12-02T14:09:30.862122-0600","caller":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"} Dec 02 14:09:30 postgres-01 etcd[7266]: {"level":"info","ts":"2024-12-02T14:09:30.862347-0600","caller":"embed/serve.go:187","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"[::]:2379"}

Verification

Once cluster is running, we should verify it’s working on each by running

etcdctl endpoint health
etcdctl member list

Should see something like on node1:

127.0.0.1:2379 is healthy: successfully committed proposal: took = 1.786976ms
eb8ee14ab5150b4, started, postgresql-01, http://192.168.60.103:2380, http://192.168.60.103:2379, false
34e89b244664f02d, started, postgresql-02, http://192.168.60.104:2380, http://192.168.60.104:2379, false
8ee2a9473a41c400, started, postgresql-03, http://192.168.60.105:2380, http://192.168.60.105:2379, false

Do this for all 3 nodes

Now we should now edit the variables since the cluster is bootstrapped

Then restart etcd and verify the cluster is still working

sudo systemctl restart etcd

If you want to run the following commands without sudo, you can run:

sudo usermod -aG etcd $USER

sudo etcdctl \

--endpoints=https://127.0.0.1:2379 \ --cacert=/etc/etcd/ssl/ca.crt \ --cert=/etc/etcd/ssl/etcd-node1.crt \ --key=/etc/etcd/ssl/etcd-node1.key \ endpoint health

sudo etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/etcd/ssl/ca.crt \ --cert=/etc/etcd/ssl/etcd-node1.crt \ --key=/etc/etcd/ssl/etcd-node1.key \ member list

Should see something like

https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.611121ms
59afb19d7cb2565d, started, postgresql-01, https://192.168.60.103:2380, https://192.168.60.103:2379, false
6338565ebcb76aa2, started, postgresql-02, https://192.168.60.104:2380, https://192.168.60.104:2379, false
9d74b3125c745c74, started, postgresql-03, https://192.168.60.105:2380, https://192.168.60.105:2379, false

You can check to see who is leader by running

sudo etcdctl \
  --endpoints=https://192.168.60.103:2379,https://192.168.60.104:2379,https://192.168.60.105:2379 \
  --cacert=/etc/etcd/ssl/ca.crt \
  --cert=/etc/etcd/ssl/etcd-node1.crt \
  --key=/etc/etcd/ssl/etcd-node1.key \
  endpoint status --write-out=table

You should see something like this, specifically one of the nodes IS LEADER = true

+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.60.103:2379 | 59afb19d7cb2565d | 3.5.17 | 20 kB | true | false | 2 | 12 | 12 | | | https://192.168.60.104:2379 | 6338565ebcb76aa2 | 3.5.17 | 20 kB | false | false | 2 | 12 | 12 | | | https://192.168.60.105:2379 | 9d74b3125c745c74 | 3.5.17 | 20 kB | false | false | 2 | 12 | 12 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+