{{announcement.body}}
{{announcement.title}}

How To Automate PostgreSQL and repmgr on Vagrant

DZone 's Guide to

How To Automate PostgreSQL and repmgr on Vagrant

In this tutorial, I build a fault-tolerant PostgreSQL cluster using Vagrant and Ansible. This configuration may help minimize deployment issues.

· Database Zone ·
Free Resource

I often get asked if it's possible to build a resilient system with PostgreSQL.

Considering that resilience should feature cluster high-availability, fault tolerance, and self-healing, it's not an easy answer. But there is a lot to be said about this.

As of today, we can't achieve that level of resilience with the same ease as MongoDB built-in features. But let's see what we can in fact do with the help of repmgr and some other tooling.

At the end of this exercise, we will have achieved some things that come in handy, such as:

  • a few Ansible roles that can be reused for production
  • a Vagrantfile for single-command cluster deployment
  • a development environment that’s more realistic; being close to production state is good to foresee "production-exclusive issues"

Objectives

  • build a local development environment PostgreSQL cluster with fault tolerance capabilities;
  • develop configuration management code to reuse in production.

Pre-requisites

Install Vagrant, VirtualBox and Ansible

Java
 




x


 
1
sudo apt install vagrant
2
sudo apt install virtualbox && sudo apt install virtualbox-dkms
3
sudo apt install ansible


Note: An alternative to installing Ansible on your host machine would be using the ansible-local Vagrant provider, which needs Ansible installed on the generated virtual machine instead.

Configuration

1. Write a Vagrantfile

You can use vagrant init to generate the file or simply create it and insert our first blocks.

Ruby
 




xxxxxxxxxx
1
16


 
1
Vagrant.configure("2") do |config|
2
  (1..3).each do |n|
3
    config.vm.define "node#{n}" do |define|
4
      define.ssh.insert_key = false
5
      define.vm.box = "ubuntu/bionic64"
6
      define.vm.hostname = "node#{n}"
7
      define.vm.network :private_network, ip: "172.16.1.1#{n}"
8
 
          
9
      define.vm.provider :virtualbox do |v|
10
        v.cpus = 2
11
        v.memory = 1024
12
        v.name = "node#{n}"
13
      end
14
    end
15
  end
16
end


Let's go block by block:

  • the 1st block is where we set up the Vagrant version;
  • on the 2nd block, we iterate the following code so we reuse it to generate 3 equal VMs;
  • OS, hostname and network settings are set in the 3rd block;
  • the 4th block contains VirtualBox specific settings.

You can create the servers with:

Java
 




xxxxxxxxxx
1


 
1
# create all 3 VMs
2
vagrant up
3
# or create only a specific VM
4
vagrant up node1


2. Add a provisioner

Just by doing the first step alone, we can already launch 3 working virtual machines. A little exciting, but the best is yet to come.

Launching virtual machines is a nice feature of Vagrant, but we want these servers to have PostgreSQL and repmgr configured, so we will use configuration management software to help us. This is the moment Ansible walks in to amaze us.

Vagrant supports several providers, two of them being Ansible and Ansible Local. The difference between them is where Ansible runs, or in other words, where it must be installed. By Vagrant terms, the Ansible provider runs on a host machine (your computer) and the Ansible Local provider runs on guest machines (virtual machines). As we already installed Ansible in the prerequisites section, we'll go with the first option.

Let's add a block for this provisioner in our Vagrantfile.

Ruby
 




xxxxxxxxxx
1
38


1
Vagrant.configure("2") do |config|
2
  (1..3).each do |n|
3
    config.vm.define "node#{n}" do |define|
4
      define.ssh.insert_key = false
5
      define.vm.box = "ubuntu/bionic64"
6
      define.vm.hostname = "node#{n}"
7
      define.vm.network :private_network, ip: "172.16.1.1#{n}"
8
 
          
9
      define.vm.provider :virtualbox do |v|
10
        v.cpus = 2
11
        v.memory = 1024
12
        v.name = "node#{n}"
13
      end
14
 
          
15
      if n == 3
16
        define.vm.provision :ansible do |ansible|
17
          ansible.limit = "all"
18
          ansible.playbook = "provisioning/playbook.yaml"
19
 
          
20
          ansible.host_vars = {
21
            "node1" => {:connection_host => "172.16.1.11",
22
                        :node_id => 1,
23
                        :role => "primary" },
24
 
          
25
            "node2" => {:connection_host => "172.16.1.12",
26
                        :node_id => 2,
27
                        :role => "standby" },
28
 
          
29
            "node3" => {:connection_host => "172.16.1.13",
30
                        :node_id => 3,
31
                        :role => "witness" }
32
          }
33
        end
34
      end
35
 
          
36
    end
37
  end
38
end


Ansible allows us to configure several servers simultaneously. To take advantage of this feature on Vagrant, we add ansible.limit = "all" and must wait until all 3 VMs are up. Vagrant knows they are all created because of the condition if n == 3, which makes Ansible only run after Vagrant iterated 3 times.

ansible.playbook is the configuration entry point and ansible.host_vars contains the Ansible host variables to be used on the tasks and templates we are about to create.

3. Create an organized Ansible folder structure

If you're already familiar with Ansible, there's little to learn in this section. For those who aren't, it doesn't get too complicated.

First, we have a folder for all Ansible files, named provisioning. Inside this folder, we have our aforementioned entry point playbook.yaml, a group_vars folder for Ansible group variables, and a roles folder.

We could have all Ansible tasks within playbook.yaml, but role folder structure helps with organization. You can read the Ansible documentation to learn the best practices. Below, you will find the folder structure for this tutorial.

Plain Text
 




xxxxxxxxxx
1
11


 
1
project_root
2
| provisioning
3
|  |  group_vars
4
|  |  |  all.yaml
5
|  |  roles
6
|  |  |  postgres_12
7
|  |  |  registration
8
|  |  |  repmgr
9
|  |  |  ssh
10
|  |  playbook.yaml
11
|  Vagrantfile


4. Ansible roles

4.1 PostgreSQL role

To configure repmgr on PostgreSQL, we need to edit two well-known PostgreSQL configuration files: postgresql.conf and pg_hba.conf. We will then write our tasks to apply the configurations on tasks/main.yaml. I named the PostgreSQL role folder as postgres_12 but you can easily use another version if you want to.

Plain Text
 




xxxxxxxxxx
1


 
1
postgres_12
2
|  tasks
3
|  |  main.yaml
4
|  templates
5
|  |  pg_hba.conf.j2
6
|  |  pg_hba.conf.j2


You can reuse the default file which comes with PostgreSQL installation and add the following lines to whitelist repmgr database sessions from your trusted VMs. Create an Ansible template file (Jinja2 format) like so:

Jinja2
 




xxxxxxxxxx
1
14


 
1
# default configuration (...)
2
 
          
3
# repmgr
4
local   replication   repmgr                              trust
5
host    replication   repmgr      127.0.0.1/32            trust
6
host    replication   repmgr      {{ node1_ip }}/32       trust
7
host    replication   repmgr      {{ node2_ip }}/32       trust
8
host    replication   repmgr      {{ node3_ip }}/32       trust
9
 
          
10
local   repmgr        repmgr                              trust
11
host    repmgr        repmgr      127.0.0.1/32            trust
12
host    repmgr        repmgr      {{ node1_ip }}/32       trust
13
host    repmgr        repmgr      {{ node2_ip }}/32       trust
14
host    repmgr        repmgr      {{ node3_ip }}/32       trust


In the same fashion as pg_hba.conf, you can reuse the postgresql.conf default file and add a few more replication related settings to the bottom of the file:

Jinja2
 




xxxxxxxxxx
1
11


1
# default configuration (...)
2
 
          
3
# repmgr
4
listen_addresses = '*'
5
shared_preload_libraries = 'repmgr'
6
wal_level = replica
7
max_wal_senders = 5
8
wal_keep_segments = 64
9
max_replication_slots = 5
10
hot_standby = on
11
wal_log_hints = on


The tasks below will install PostgreSQL and apply our configurations. Their names are self-explanatory.

YAML
 




xxxxxxxxxx
1
29


1
- name: Add PostgreSQL apt key
2
  apt_key:
3
    url: https://www.postgresql.org/media/keys/ACCC4CF8.asc
4
 
          
5
- name: Add PostgreSQL repository
6
  apt_repository:
7
    # ansible_distribution_release = xenial, bionic, focal
8
    repo: deb http://apt.postgresql.org/pub/repos/apt/ {{ ansible_distribution_release }}-pgdg main
9
 
          
10
- name: Install PostgreSQL 12
11
  apt:
12
    name: postgresql-12
13
    update_cache: yes
14
 
          
15
- name: Copy database configuration
16
  template:
17
    src: full_postgresql.conf.j2
18
    dest: /etc/postgresql/12/main/postgresql.conf
19
    group: postgres
20
    mode: '0644'
21
    owner: postgres
22
 
          
23
- name: Copy user access configuration
24
  template:
25
    src: pg_hba.conf.j2
26
    dest: /etc/postgresql/12/main/pg_hba.conf
27
    group: postgres
28
    mode: '0640'
29
    owner: postgres


4.2 SSH server configuration

Plain Text
 




xxxxxxxxxx
1


1
ssh
2
|  files
3
|  |  keys
4
|  |   |  id_rsa
5
|  |   |  id_rsa.pub
6
|  tasks
7
|  |  main.yaml


Generate a key pair to use throughout our virtual machines to allow access to them. If you don't know how to do it, this link can help. Just make sure the keys file paths match the paths in the next step.

The tasks below will install the OpenSSH server and apply our configurations. Their names are self-explanatory.

Plain Text
 




xxxxxxxxxx
1
41


1
- name: Install OpenSSH
2
  apt:
3
    name: openssh-server
4
    update_cache: yes
5
    state: present
6
 
          
7
- name: Create postgres SSH directory
8
  file:
9
    mode: '0755'
10
    owner: postgres
11
    group: postgres
12
    path: /var/lib/postgresql/.ssh/
13
    state: directory
14
 
          
15
- name: Copy SSH private key
16
  copy:
17
    src: "keys/id_rsa"
18
    dest: /var/lib/postgresql/.ssh/id_rsa
19
    owner: postgres
20
    group: postgres
21
    mode: '0600'
22
 
          
23
- name: Copy SSH public key
24
  copy:
25
    src: "keys/id_rsa.pub"
26
    dest: /var/lib/postgresql/.ssh/id_rsa.pub
27
    owner: postgres
28
    group: postgres
29
    mode: '0644'
30
 
          
31
- name: Add key to authorized keys file
32
  authorized_key:
33
    user: postgres
34
    state: present
35
    key: "{{ lookup('file', 'keys/id_rsa.pub') }}"
36
 
          
37
- name: Restart SSH service
38
  service:
39
    name: sshd
40
    enabled: yes
41
    state: restarted


4.3 repmgr installation

Plain Text
 




xxxxxxxxxx
1


1
repmgr
2
|  tasks
3
|  |  main.yaml
4
|  templates
5
|  |  repmgr.conf.j2


We configure settings like promote command, follow command, timeouts and retry count on failure scenarios inside repmgr.conf. We will copy this file to its default directory /etc to avoid passing the -f argument on the repmgr command all the time.

The tasks below will install repmgr and apply our configurations. Their names are self-explanatory.

YAML
 




xxxxxxxxxx
1
33


 
1
- name: Download repmgr repository installer
2
  get_url:
3
    dest: /tmp/repmgr-installer.sh
4
    mode: 0700
5
    url: https://dl.2ndquadrant.com/default/release/get/deb
6
 
          
7
- name: Execute repmgr repository installer
8
  shell: /tmp/repmgr-installer.sh
9
 
          
10
- name: Install repmgr for PostgreSQL {{ pg_version }}
11
  apt:
12
    name: postgresql-{{ pg_version }}-repmgr
13
    update_cache: yes
14
 
          
15
- name: Setup repmgr user and database
16
  become_user: postgres
17
  ignore_errors: yes
18
  shell: |
19
    createuser --replication --createdb --createrole --superuser repmgr &&
20
    psql -c 'ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;' &&
21
    createdb repmgr --owner=repmgr
22
 
          
23
- name: Copy repmgr configuration
24
  template:
25
    src: repmgr.conf.j2
26
    dest: /etc/repmgr.conf
27
 
          
28
- name: Restart PostgreSQL
29
  systemd:
30
    name: postgresql
31
    enabled: yes
32
    state: restarted


4.4 repmgr node registration

Finally, we reach the moment where fault tolerance is established.

Plain Text
 




xxxxxxxxxx
1


1
registration
2
|  tasks
3
|  |  main.yaml


YAML
 




xxxxxxxxxx
1
13


1
node_id = {{ node_id }}
2
node_name = 'node{{ node_id }}'
3
conninfo = 'host={{ connection_host }} user=repmgr dbname=repmgr'
4
data_directory = '/var/lib/postgresql/{{ pg_version }}/main'
5
use_replication_slots = yes
6
reconnect_attempts = 5
7
reconnect_interval = 1
8
failover = automatic
9
pg_bindir = '/usr/lib/postgresql/{{ pg_version }}/bin'
10
promote_command = 'repmgr standby promote -f /etc/repmgr.conf'
11
follow_command = 'repmgr standby follow -f /etc/repmgr.conf'
12
log_level = INFO
13
log_file = '/var/log/postgresql/repmgr.log'


This role was built according to the repmgr documentation and it might be the most complex role, as it needs to:

  • run some commands as root and others as Postgres;
  • stop services between reconfigurations;
  • have different tasks for primary, standby, and support witness role configuration (in case you want node3 to also be a standby node, just assign role: standby in Vagrantfile ansible.host_vars)
YAML
 




xxxxxxxxxx
1
42


1
- name: Register primary node
2
  become_user: postgres
3
  shell: repmgr primary register
4
  ignore_errors: yes
5
  when: role == "primary"
6
 
          
7
- name: Stop PostgreSQL
8
  systemd:
9
    name: postgresql
10
    state: stopped
11
  when: role == "standby"
12
 
          
13
- name: Clean up PostgreSQL data directory
14
  become_user: postgres
15
  file:
16
    path: /var/lib/postgresql/{{ pg_version }}/main
17
    force: yes
18
    state: absent
19
  when: role == "standby"
20
 
          
21
- name: Clone primary node data
22
  become_user: postgres
23
  shell: repmgr -h {{ node1_ip }} -U repmgr -d repmgr standby clone
24
  ignore_errors: yes
25
  when: role == "standby"
26
 
          
27
- name: Start PostgreSQL
28
  systemd:
29
    name: postgresql
30
    state: started
31
  when: role == "standby"
32
 
          
33
- name: Register {{ role }} node
34
  become_user: postgres
35
  shell: repmgr {{ role }} register -F
36
  ignore_errors: yes
37
  when: role != "primary"
38
 
          
39
- name: Start repmgrd
40
  become_user: postgres
41
  shell: repmgrd
42
  ignore_errors: yes


5. Set Group Variables

Create a file group_vars/all.yaml to set your VMs IP addresses and the PostgreSQL version you would like to use. Like host_vars set on Vagrantfile, these variables will be placed in the templates placeholders.

YAML
 




xxxxxxxxxx
1


1
client_ip: "172.16.1.1"
2
node1_ip: "172.16.1.11"
3
node2_ip: "172.16.1.12"
4
node3_ip: "172.16.1.13"
5
pg_version: "12"


6. Put All Pieces Together With a Playbook

The only thing missing is the playbook itself. Create a file named playbook.yaml and invoke the roles we have been developing. gather_facts is an Ansible property to fetch operative system data like distribution (ansible_distribution_release) among other useful variables. You can also read these variables with the Ansible setup module.

YAML
 




xxxxxxxxxx
1


1
- hosts: all
2
  gather_facts: yes
3
  become: yes
4
  roles:
5
    - postgres_12
6
    - ssh
7
    - repmgr
8
    - registration


7. Start Cluster

It's finished. You can now start your cluster with vagrant up and then perform your connections and failover tests.

Testing Cluster Failover

Now that our cluster is up and configured, you can start by shutting down your standby node:

Java
 




xxxxxxxxxx
1


 
1
# save standby state and shut it down ungracefully
2
vagrant suspend node2


You will see that the cluster is operating normally. Bring the standby node back and it will stay that way.

Java
 




xxxxxxxxxx
1


 
1
# bring standby back online after suspension
2
vagrant resume node1


How about taking down the primary node?

Java
 




xxxxxxxxxx
1


 
1
# save primary state and shut it down ungracefully
2
vagrant suspend node1


At this point, as repmgrd is enabled, the standby node will retry connecting to the primary node the configured number of times (reconnect_attempts = 5) and, if it obtains no response, will promote itself to primary and take over write operations on the PostgreSQL cluster. Success!

To join the cluster again, the old primary node will have to lose its current data, clone the new primary data, and register as a new standby.

Plain Text
 




xxxxxxxxxx
1


1
vagrant resume node1
2
vagrant ssh node1
3
service postgresql stop
4
rm -r /var/lib/postgresql/12/main
5
repmgr -h 172.16.1.12 -U -d repmgr standby clone
6
service postgresql start
7
repmgr standby register -F
8
repmgrd
9
repmgr service status


This last command shows us that the cluster is working properly, but with inverted roles.

Plain Text
 




xxxxxxxxxx
1


1
postgres@node1:~$ repmgr service status
2
 ID | Name  | Role  | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
3
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
4
 1  | node1 | standby |   running | node2   | running | 22490 | no      | n/a                
5
 2  | node2 | primary | * running |         | running | 22548 | no      | 0 second(s) ago   
6
 3  | node3 | witness | * running | node2   | running | 22535 | no      | 0 second(s) ago   


Nothing wrong with this, but let's make these nodes switch their roles.

Java
 




xxxxxxxxxx
1


 
1
# ssh in and out just to add host key to known_hosts file
2
ssh <current_primary_ip_address> -o StrictHostKeyChecking=no
3
exit
4
# trigger switchover on current standby
5
repmgr standby switchover --siblings-follow


And we're back to the initial state.

Plain Text
 




xxxxxxxxxx
1


 
1
postgres@node1:~$ repmgr service status
2
 ID | Name  | Role  | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
3
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
4
 1  | node1 | primary | * running |         | running | 22490 | no      | n/a                
5
 2  | node2 | standby |   running | node1   | running | 22548 | no      | 0 second(s) ago   
6
 3  | node3 | witness | * running | node1   | running | 22535 | no      | 0 second(s) ago   


Conclusion

We managed to build a fault-tolerant PostgreSQL cluster using Vagrant and Ansible.

High availability is a big challenge. Much like life’s own matters, we are only prepared for the biggest challenges when we fit that challenges’ conditions.

Production environment unique problems are natural and tough to guess. Bridging the gap between development and production is a way to prevent deployment/production issues. We can make some efforts toward that objective, and that is precisely what we just achieved with this high availability database setup.

You can find the source code of this tutorial here.

Topics:
ansible, automation, cluster management, database, database administrator, devops, fault tolerance, postgres, postgresql, vagrant

Published at DZone with permission of Rui Trigo . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}