How I've Set Up HA PostgreSQL on Kubernetes

· Read in about 9 min · (1806 words) ·

Why Do I Need PostgreSQL on Kubernetes

I’ve decided to migrate GolangCI from Heroku to Kubernetes. Before that GolangCI used Heroku PostgreSQL as the primary database. The popular way to migrate is to use managed cloud PostgreSQL service like AWS Aurora or Google Cloud SQL. For example Algolia migrated from Heroku to GKE and used Google Cloud SQL. But I needed fully controlled solution because GolangCI has on-premise version which should work on a local Kubernetes cluster without access to AWS or Google Cloud.

With the Helm I Will Do it in 5 minutes

I was a new to Kubernetes so first I have tried to install PostgreSQL by helm install stable/postgresql. This Helm chart uses a bitnami image. Bitnami has an article about using this image for making production-ready PostgreSQL on Kubernetes.

Soon I realized that this installation has not production-ready config and we need to install it like this:

helm install --name pg --namespace postgres -f ./values-production.yaml stable/postgresql
kubectl -n postgres scale statefulset my-postgresql-slave --replicas=3

Where is the Automatic Master Failover

HA PostgreSQL should automatically survive and recover if node with a database master (primary) crashes. With the pure Kubernetes solution from the previous step it works in the following way:

  1. a PostgreSQL master pod or node crashes
  2. Kubernetes runs a healthcheck every 10 seconds by default. We can tune periodSeconds and run a healthcheck every second (it’s the minimum interval). In one second after the crash Kubernetes detects it.
  3. Kubernetes schedules a new pod for the PostgreSQL master server. If a node crashed it may need to run a new node first. If just a pod failed Kubernetes can just restart a pod.
  4. PostgreSQL master starts

This pure Kubernetes HA solution has an advantage of simplicity of setup. But it has the following drawbacks:

  1. we have a downtime between steps 1 and 4 and the downtime can be minutes if Kubernetes needs to restart a node
  2. no stable and trusted cross-zone availability
    • PostgreSQL master run in a StatefulSet with persistent volumes.
    • By default persistent volumes exist only in a one zone. If the full zone outage occurs we can’t automatically failover the master: Kubernetes pod with the master can’t be scheduled in another availability zone.
    • AWS EBS doesn’t support cross-zone replication of disks. Google Cloud does support regional persistent disks but it’s the beta feature.
    • I don’t trust Cross-Zone High Availability based on a closed-source disk replication. Better if HA based on open-source consensus algorithms. In this paper authors don’t call the «modern HA» this solution because it’s based on opaque disk replication and not consensus.

There is an interesting comparison of pure Kubernetes and Stolon HA. Stolon is the HA wrapper for PostgreSQL that can work on kubernetes. We will look at it later.

Choosing Cross DC Consensus Based HA PostgreSQL

I decided that I need cross-datacenter (cross-zone) HA PostgreSQL solution based on consensus. There are two primary choices:

Stolon has the following architecture: Stolon architecture

The both projects are similar by supported features. The small comparison:

  1. Patroni is supported by Postgres Operator
  2. Both of projects support sync replication: patroni and stolon.
  3. Both of them support quorum-based sync replication (but not natively by PostgreSQL): patroni and stolon

I’ve choosed (like lwolfs) the Stolon because it has more clear for me architecture.

Running Stolon PostgreSQL on Kubernetes

Set up

The steps I performed to set up Stolon PostgreSQL in k8s with Helm:

  1. Make a copy of values.yml (I’ve used it from this commit).
  2. Create secrets for PostgreSQL superuser and replication in a namespace postgres:

    Superuser:

    kubectl create secret generic pg-su \
      --namespace postgres \
      --from-literal=username='su_username' --from-literal=password='su_password'
    

    and set in values.yml:

    superuserSecret:
      name: pg-su
      usernameKey: username
      passwordKey: password
    

    Replication:

    kubectl create secret generic pg-repl \
      --namespace postgres \
      --from-literal=username='repl_username' --from-literal=password='repl_password'
    

    and set in values.yml:

    replicationSecret:
      name: pg-repl
      usernameKey: username
      passwordKey: password
    
  3. Remove not needed parts from values.yaml:

    superuserUsername: "stolon"
    ## password for the superuser (REQUIRED if superuserSecret is not set)
    superuserPassword:
    
    replicationUsername: "repluser"
    ## password for the replication user (REQUIRED if replicationSecret is not set)
    replicationPassword:
    
  4. Configure replication. Replace

    clusterSpec: {}
      # sleepInterval: 1s
      # maxStandbys: 5
    

    with the

    clusterSpec:
      synchronousReplication: true
      minSynchronousStandbys: 1 # quorum-like replication
      maxSynchronousStandbys: 1 # quorum-like replication
      initMode: new
    
  5. Set pod disruption budget

    podDisruptionBudget:
      minAvailable: 2
    
  6. Configure pgParameters and set replica count to 3 (total 3 instances: 1 master, 1 sync standby and 1 async standby):

    pgParameters:
      max_connections: 100
    
    keeper:
      replicaCount: 3
    # ...
    proxy:
      replicaCount: 3
    # ...
    sentinel:
      replicaCount: 3
    
  7. Set up Prometheus service discovery annotations:

    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "8080"
    
  8. Install the helm chart

    $ helm install --name pg --namespace postgres -f local-values.yml ~/charts/stable/stolon/
    NAME:   pg
    LAST DEPLOYED: Thu Feb 28 13:16:20 2019
    NAMESPACE: postgres
    STATUS: DEPLOYED
    
    RESOURCES:
    ==> v1/ServiceAccount
    NAME       SECRETS  AGE
    pg-stolon  1        5s
    
    ==> v1beta1/Role
    NAME       AGE
    pg-stolon  5s
    
    ==> v1beta1/RoleBinding
    NAME       AGE
    pg-stolon  5s
    
    ==> v1/Service
    NAME                       CLUSTER-IP     EXTERNAL-IP  PORT(S)   AGE
    pg-stolon-keeper-headless  None           <none>       5432/TCP  5s
    pg-stolon-proxy            10.39.250.171  <none>       5432/TCP  5s
    
    ==> v1beta2/Deployment
    NAME                KIND
    pg-stolon-proxy     Deployment.v1beta2.apps
    pg-stolon-sentinel  Deployment.v1beta2.apps
    
    ==> v1beta2/StatefulSet
    pg-stolon-keeper  StatefulSet.v1beta2.apps
    
    ==> v1beta1/PodDisruptionBudget
    NAME                MIN-AVAILABLE  MAX-UNAVAILABLE  ALLOWED-DISRUPTIONS  AGE
    pg-stolon-keeper    2              N/A              0                    5s
    pg-stolon-proxy     2              N/A              0                    5s
    pg-stolon-sentinel  2              N/A              0                    5s
    
    ==> v1/ConfigMap
    NAME       DATA  AGE
    pg-stolon  0     5s
    
    NOTES:
    Stolon cluster installed and initialized.
    
    To get superuser password run
    
        PGPASSWORD=$(kubectl get secret --namespace postgres pg-su -o jsonpath="{.data.password}" | base64 --decode; echo)
    
  9. Check the installed cluster:

    $ kubectl -n postgres get all
    NAME                                      READY     STATUS      RESTARTS   AGE
    pod/pg-stolon-create-cluster-mhmxm        0/1       Completed   0          4m
    pod/pg-stolon-keeper-0                    1/1       Running     0          4m
    pod/pg-stolon-keeper-1                    1/1       Running     0          4m
    pod/pg-stolon-proxy-6f648b49d4-cnnpz      1/1       Running     0          4m
    pod/pg-stolon-proxy-6f648b49d4-rsv8j      1/1       Running     0          4m
    pod/pg-stolon-sentinel-694cf8f76f-8fmr9   1/1       Running     0          4m
    pod/pg-stolon-sentinel-694cf8f76f-d5f6r   1/1       Running     0          4m
    
    NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    service/kubernetes                  ClusterIP   10.39.240.1     <none>        443/TCP    3d
    service/pg-stolon-keeper-headless   ClusterIP   None            <none>        5432/TCP   4m
    service/pg-stolon-proxy             ClusterIP   10.39.250.171   <none>        5432/TCP   4m
    
    NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/pg-stolon-proxy      2         2         2            2           4m
    deployment.apps/pg-stolon-sentinel   2         2         2            2           4m
    
    NAME                                            DESIRED   CURRENT   READY     AGE
    replicaset.apps/pg-stolon-proxy-6f648b49d4      2         2         2         4m
    replicaset.apps/pg-stolon-sentinel-694cf8f76f   2         2         2         4m
    
    NAME                                DESIRED   CURRENT   AGE
    statefulset.apps/pg-stolon-keeper   2         2         4m
    
    NAME                                 DESIRED   SUCCESSFUL   AGE
    job.batch/pg-stolon-create-cluster   1         1            4m
    

    and show distribution amongst nodes:

    $ kubectl -n postgres get pods -o wide
    NAME                                  READY     STATUS      RESTARTS   AGE       IP           NODE
    pg-stolon-create-cluster-mhmxm        0/1       Completed   0          6m        10.36.2.28   gke-primary-services-default-pool-8dd9a533-3tkl
    pg-stolon-keeper-0                    1/1       Running     0          6m        10.36.2.29   gke-primary-services-default-pool-8dd9a533-3tkl
    pg-stolon-keeper-1                    1/1       Running     0          6m        10.36.2.30   gke-primary-services-default-pool-8dd9a533-3tkl
    pg-stolon-proxy-6f648b49d4-cnnpz      1/1       Running     0          6m        10.36.2.27   gke-primary-services-default-pool-8dd9a533-3tkl
    pg-stolon-proxy-6f648b49d4-rsv8j      1/1       Running     0          6m        10.36.0.30   gke-primary-services-default-pool-8dd9a533-s2vg
    pg-stolon-sentinel-694cf8f76f-8fmr9   1/1       Running     0          6m        10.36.0.29   gke-primary-services-default-pool-8dd9a533-s2vg
    pg-stolon-sentinel-694cf8f76f-d5f6r   1/1       Running     0          6m        10.36.1.18   gke-primary-services-default-pool-8dd9a533-h4zs
    

    First, I got: 2019-03-01 06:19:51.982 GMT [108194] LOG: unrecognized configuration parameter "maxConnections" in file "/stolon-data/postgres/stolon-temp-postgresql.conf" line 6
    2019-03-01 06:19:51.982 GMT [108194] FATAL: configuration file "/stolon-data/postgres/stolon-temp-postgresql.conf" contains errors
    2019-03-01T06:19:52.172Z ERROR cmd/keeper.go:1083 failed to start instance {"error": "postgres exited unexpectedly"}
    2019-03-01T06:19:53.897Z ERROR cmd/keeper.go:641 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432:
    To fix it I've changed pgParameters: maxConnections: 100 to pgParameters: max_connections: "100" After that I got an error because of unknown sha256sum in a template: annotations: checksum/config: {{ include (print .Template.BasePath "/hooks/update-cluster-spec-job.yaml") . | sha256sum }} To fix it I've update Helm to the latest version.

Test the Cluster

Check Simple Commands

Find pg cluster ip:

$ kubectl -n postgres get services | fgrep proxy
pg-stolon-proxy             ClusterIP   10.39.252.208   <none>        5432/TCP   18m

find password from the secret:

echo $(kubectl get secret --namespace postgres pg-su -o jsonpath="{.data.password}" | base64 --decode)

connect to postgres through proxy and create table:

$ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 10.39.252.208 --port 5432 --username superuser_name -W -d test
test=# create table test (id int primary key not null, value text not null);
CREATE TABLE
test=# insert into test values (1, 'value1');
INSERT 0 1
test=# select * from test;
 id | value
----+--------
  1 | value1
(1 row)

Check a Master Death

Find out the master pod: pg-stolon-keeper-0 is the master because pg_is_in_recovery is false for it.

$ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 127.0.0.1 --port 5432 --username superuser_name -W -d test
test=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

Check the master death: connect to the current master in keeper-0 through proxy:

$ kubectl -n postgres exec -it pg-stolon-keeper-0 -- psql --host 10.39.252.208 --port 5432 --username superuser_name -W -d test
test=# select * from test;
 id | value
----+--------
  1 | value1
(1 row)

Don’t close this SQL shell and in another window run:

$ kubectl -n postgres delete pod pg-stolon-keeper-0
pod "pg-stolon-keeper-0" deleted

Let’s check logs of sentinels:

2019-03-01T08:25:38.090Z        WARN    cmd/sentinel.go:264     no keeper info available        {"db": "66ced60c", "keeper": "keeper0"}
2019-03-01T08:25:43.140Z        WARN    cmd/sentinel.go:264     no keeper info available        {"db": "66ced60c", "keeper": "keeper0"}
2019-03-01T08:25:48.201Z        WARN    cmd/sentinel.go:264     no keeper info available        {"db": "66ced60c", "keeper": "keeper0"}
2019-03-01T08:25:53.412Z        WARN    cmd/sentinel.go:264     no keeper info available        {"db": "66ced60c", "keeper": "keeper0"}
2019-03-01T08:25:58.462Z        INFO    cmd/sentinel.go:976     master db is failed     {"db": "66ced60c", "keeper": "keeper0"}
2019-03-01T08:25:58.462Z        INFO    cmd/sentinel.go:987     trying to find a new master to replace failed master
2019-03-01T08:25:58.463Z        INFO    cmd/sentinel.go:1013    electing db as the new master   {"db": "6784fde9", "keeper": "keeper1"}
2019-03-01T08:26:08.564Z        INFO    cmd/sentinel.go:1132    removing old master db  {"db": "66ced60c", "keeper": "keeper0"}
2019-03-01T08:26:08.564Z        INFO    cmd/sentinel.go:1218    one of the new synchronousStandbys has been removed     {"db": "66ced60c", "inSyncStandbys": [], "synchronousStandbys": ["66ced60c"]}
2019-03-01T08:26:08.565Z        INFO    cmd/sentinel.go:1230    setting the expected sync-standbys to the current known in sync sync-standbys   {"inSyncStandbys": [], "synchronousStandbys": ["66ced60c"]}

we see that keeper-1 became a new master, check it:

$ kubectl -n postgres exec -it pg-stolon-keeper-1 -- psql --host 127.0.0.1 --port 5432 --username superuser_name -W -d test -c 'select pg_is_in_recovery()'
 pg_is_in_recovery
-------------------
 f
(1 row)

kubernetes restarts a pod:

$ kubectl -n postgres get pods | fgrep keeper
pg-stolon-keeper-0                    1/1       Running     0          1m
pg-stolon-keeper-1                    1/1       Running     0          49m
pg-stolon-keeper-2                    1/1       Running     0          49m

before restart we had:

  • keeper-0 - master
  • keeper-1 - sync replica
  • keeper-2 - async replica

check it after restart:

$ kubectl -n postgres exec -it pg-stolon-keeper-1 cat /stolon-data/postgres/postgresql.conf | fgrep sync
synchronous_standby_names = 'stolon_46cb4a21'
$ kubectl -n postgres exec -it pg-stolon-keeper-0 cat /stolon-data/postgres/recovery.conf | fgrep slot_name
primary_slot_name = 'stolon_3a667e52'
$ kubectl -n postgres exec -it pg-stolon-keeper-2 cat /stolon-data/postgres/recovery.conf | fgrep slot_name
primary_slot_name = 'stolon_46cb4a21'

and after restart we have:

  • keeper-0 - async replica
  • keeper-1 - master
  • keeper-2 - sync replica

In PostgreSQL shell to the proxy run select again:

test=# select * from test;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
test=# select * from test;
 id | value
----+--------
  1 | value1
(1 row)

and we see that proxy closed connection to old master as we expected.

Notes

I should note two things:

  1. when killing master Stolon re-elected master faster than pod was restarted.
  2. the downside is that proxy goes only to a master, we need to setup haproxy to go to slaves too.

Conclusion

This Stolon setup is running in a production of GolangCI now. We haven’t encountered any problems with it yet.