High Availability Using DRBD

Let’s take a look at the first option, which creates a high availability environment between two VitalPBX instances.

In this High Availability environment, we will be using DRBD or Distributed Replicated Block Device. This involves ensuring critical systems and services are available with minimal downtime in case of a failure. DRBD enables real-time data replication between nodes to
ensure data availability and integrity.
First, let’s look into the requirements for this type of High Availability.

  • Physical Infrastructure and Networking – Two or more identical nodes (servers) to
    implement redundancy. These need to have the same hardware specifications and
    VitalPBX licensing. This means that if you are using a Carrier Plus license, each server
    will need its own license. This will ensure that both servers have the same permissions
    when the configurations are being replicated. We also need a reliable and low-latency
    network connection between the nodes. This can be a dedicated replication network
    (preferred) or a shared network if it’s of high quality.
  • Operating System / VitalPBX version – Nodes should run the same operating
    system using the same version and the same version of VitalPBX.
  • Disk Partitioning – The storage device to be replicated should be partitioned and
    accessible on both nodes. Each node should have sufficient storage space to
    accommodate replication and data.
  • Network Configuration – Each node should have static IP addresses and resolve
    correctly in the local DNS system or in the /etc/hosts file of the other node. Host
    names should be consistent across nodes.
  • DRBD – Install and configure DRBD on both nodes. Configure DRBD resources that
    define which devices will be replicated and how replication will be established. At the
    time of installation leave the largest amount of space on the hard drive to store the
    variable data on both servers. Define node roles: primary and secondary nodes.

With these requirements met and understood, we can start by installing Debian and VitalPBX on two servers. You can start by following the Installation Section for this guide. When you get to the partitions part of the installation, you must select Guided – use the entire disk.

Next, select the option All files in one partition (recommended for new users)

On the next screen select the #1 Primary partition to delete it.

Delete the partition #1 Primary partition to create Free Space.

With the partition deleted, we select the pri/log FREE SPACE option.

You will now select how to use the free space. Select the Create a new partition option.

Now, change the capacity of this partition to 20GB. This partition is solely for the OS and its applications. We make sure that it has enough space for the future. As a rule of thumb, this partition must be at least 20GB or 10% of your total storage space. So if you have a 1TB drive, you would allocate 100GB, for example.

We then define this partition as a Primary Partition. Afterward, we will select the location for this partition to be the Beginning.

With this set, we will be shown a summary of the changes to this partition. Select the option Done setting up the partition.

Next, we are shown the partitions to be set on the drive. Select the option Finish partitioning and write changes to disk.

Later we will be using the rest of the FREE SPACE that is available.

Finally, we are shown a summary of the changes to be made on the drive. Select Yes to the question: Write the changes to disks.

You can then proceed with the installation as normal, you can follow the steps in the Installation Section for this guide. This includes the installation of VitalPBX using the VPS installation script.

Remember, the installation process with the partitioning needs to be done twice. One for each server in our high-availability environment.

With the installation done, we can start configuring our servers. It is a good idea to write down the networking information beforehand so we can work more orderly in our high-availability environment. For this guide, we will be using the following information.

NamePrimary ServerSecondary Server
Hostnamevitalpbx-primary.localvitalpbx-secondary.local
IP Address192.168.10.31192.168.10.32
Netmask255.255.255.0255.255.255.0
Gateway192.168.10.1192.168.10.1
Primary DNS8.8.8.88.8.8.8
Secondary DNS8.8.4.48.8.4.4

Next, we will allow remote access using the root user on both servers. This will allow us to SSH login with the root user.
From the CLI, we will use nano to edit the sshd_config file.

Change the following line.

With the following.

Save the changes and exit nano. Then, restart the sshd service

root@debian:~# systemctl restart sshd

With this, you can now SSH login with the root user and password. This will make it easier to copy and paste the commands from this guide. Remember, this has to be done on both servers.

Once you are logged in with an SSH connection, we will set the static IP addresses for both servers. For this, we will use nano to modify the interfaces configuration file.

root@debian:~# nano /etc/network/interfaces

Here, change the following lines.

For the Primary Server, enter the following.

For the Secondary Server, enter the following.

Note: Your installation may have a different name for the primary network interface. Make sure that you are using the correct name for your interface

Next, we will install dependencies on both servers.

With the dependencies installed, we must set the hostnames for both VitalPBX servers. For this, we go to Admin > System Settings > Network Settings in the VitalPBX Web UI.

After setting the hostname, click the green Save button. With the hostname set from the Web UI, we will now configure the hostnames in the hosts file for each server.

Set the hostname on the Primary Server with the following command.

AAnd in the Secondary Server as follows.nd in the Secondary Server as follows.

Afterward, on both servers modify the hosts file using nano.

Add the following lines.

This way, both servers will be able to see each other using their hostnames.
Now, we will create a new partition to allocate the rest of the available space for both servers.
For this, we will use the fdisk command.

Answer as follows on the presented prompts.

Then restart both servers so that the new table is available.

With the servers rebooted, we will proceed with the HA (High Availability) cluster configuration.

Now, we will create an authorization key for the access between both servers. This way, we can access both servers without entering credentials every time.

Create an authorization key in the Primary Server.

Next, create an authorization key in the Secondary Server.

Now, we can proceed in two ways. One is using a script we made, or you can follow the manual step-by-step. If you proceed with the following script, you can skip all the steps until you reach the add-on installation in the lesson.

Afterward, you can download and run the following script from the Primary Server, using these commands.

You will then be prompted to enter the information for the servers in the cluster.

Note: The hacluster password can be anything of your liking. It does not have to be an existing password for any user in any node.

Note: Before doing any high-availability testing, make sure that the data has finished synchronizing. To do this, use the cat /proc/drbd command

The script will start configuring the HA cluster for you. CONGRATULATIONS! You now have a high availability environment with VitalPBX 4!

The following steps are if you want to proceed with the cluster configuration manually, rather than using the provided script. You can skip these steps if you decide to use the script and proceed to the add-on installation in the next lesson.

To configure the HA cluster manually, first, we need to configure the Firewall. This can be done by adding the services and rules from the VitalPBX Web UI. Here is the list of services we will configure. This needs to be configured in both servers.

ProtocolPortDescription
TCP2224This protocol is needed by the pcsd Web UI and required for node-to-node communication. It is crucial to open port 2224 in such a way that pcs from any node can talk to all nodes in the cluster, including itself. When using the Booth cluster ticket manager or a quorum device you must open port 2224 on all related hosts, such as Booth arbiters or the quorum device host.
TCP3121Pacemaker’s crmd daemon on the full cluster nodes will contact the pacemaker_remoted daemon on Pacemaker Remote nodes at port 3121. If a separate interface is used for cluster communication, the port only needs to be open on that interface. At a minimum, the port should open on Pacemaker Remote nodes to full cluster nodes. Because users may convert a host between a full node and a remote node, or run a remote node inside a container using the host’s network, it can be useful to open the port to all nodes. It is not necessary to open the port to any hosts other than nodes.
TCP5403Required on the quorum device host when using a quorum device with corosync-qnetd. The default value can be changed with the -p option of the corosync-qnetd command.
UDP5404Required on corosync nodes if corosync is configured for multicast UDP.
UDP5405Required on all corosync nodes (needed by corosync)
TCP21064Required on all nodes if the cluster contains any resources requiring DLM (such as clvm or GFS2)
TCP, UDP9929Required to be open on all cluster nodes and booth arbitrator nodes to connections from any of those same nodes when the Booth ticket manager is used to establish a multi-site cluster.
TCP7789Required by DRBD to synchronize information.

In the VitalPBX Web UI for both servers go to Admin > Firewall > Services. Add the services from the table above by clicking the Add Service button.

With all the services added, Apply Changes.

Next, we go to Admin > Firewall > Rules to add the rules to ACCEPT all the services we just created.

With all the rules added, Apply Changes. Remember, you need to add the services and rules to both servers’ firewalls.

Now, Let’s create a directory where we are going to mount the volume with all the information to be replicated in both servers.

Afterward, we will format the new partition we made in both servers using the following commands.

With all of this done, we can proceed to configure DRBD on both servers. Start by loading the module and enabling the service in both nodes using the following command.

Then, create a new global_common.conf file in both servers.

Add the following content.

Save and Exit nano. Next, create a new configuration file called drbd0.res for the new resource named drbd0 in both servers using nano.

Add the following content.

Save and Exit nano.

Note: Although the access interfaces can be used, which in this case is ETH0. It is recommended to use an interface (i.e. ETH1) for synchronization, this interface must be directly connected between both servers.

Now, initialize the metadata storage in each node by executing the following command in both servers.

Afterward, define the Primary Server as the DRBD primary node first.

Then, on the Secondary Server, run the following command to start the drbd0.

You can check the current status of the synchronization while it is being performed, using the following command.

Here is an example of the output of this command.

In order to test the DRBD functionality, we must create a file system, mount the volume, write some data in the Primary Server, and finally switch the primary node to the Secondary Server.

Run the following commands in the Primary Server to create an XFS file system in the /dev/ drbd0 directory, and mount it to the /vpbx_data directory.

Create some data using the following command in the Primary Server.

Run the following command to list the content of the /vpbx_data directory.

The command will return the following list.

Now, let’s switch the primary node “Primary Server” to the secondary node “Secondary Server” to check if the data replication works.

We will need to unmount the volume drbd0 in the Primary Server and change it from the primary node to the secondary node, and we will turn the Secondary Server into the primary node.

In the Primary Server, run the following commands.

Change the secondary node to the primary node, by running this command on the Secondary Server.

In the Secondary Server, mount the volume and check if the data is available with the following command.

The command should return something like this.

As you can see the data is being replicated, since these files were created in the Primary Server, and we are seeing them in the Secondary Server.

Now, let’s normalize the Secondary Server. Unmount the volume drbd0 and set it as the secondary node. In the Secondary Server, run the following commands.

Then, normalize the Primary Server. Turn it into the primary node, and mount the drbd0 volume to the /vpbx_data directory. In the Primary Server, run the following commands.

With the replication working, let’s configure the cluster for high availability. Create a password for the hacluster user on both servers.

Note: The hacluster password can be anything of your liking. This does not have to be a password for the root or any other user.

Then, start the PCS service on both servers, using the following command.

We must enable the PCS, Corosync, and Pacemaker services to start on both servers, with the following commands.

Now, let’s authenticate as the hacluster user using PCS Auth in the Primary Server. Enter the following commands.

The command should return the following.

Next, use the PCS cluster setup command in the Primary Server to generate and synchronize the corosync configuration.

Start the cluster in the Primary Server, with the following commands.

Note: It’s recommended to prevent resources from moving after recovery. In most circumstances, it is highly desirable to prevent healthy resources from being moved around the cluster. Moving resources always requires a period of downtime. For complex services such as databases, this period can be quite
long.

To prevent resources from moving after recovery, run this command in the Primary Server.

Now, create the resource for the use of a Floating IP Address, with the following commands in the Primary Server.

Then, create the resource to use DRBD. use the following commands in the Primary Server.

Next, create the file system for the automated mount point, using the following commands in the Primary Server.

Stop and disable all services on both servers, using the following commands.

Create the resource for the use of MariaDB in the Primary Server, using the following commands.

Change the MariaDB Path on the Secondary Server as well, using the following command.

Now, run the following commands in the Primary Server to create the MariaDB resource.

Set the paths for the Asterisk service in both servers, using the following commands.

Next, create the resource for Asterisk in the Primary Server, using the following commands.

Copy the Asterisk and VitalPBX folders and files to the DRBD partition in the Primary Server using the following commands.

Now, configure the symbolic links on the Secondary Server with the following commands.

Then, create the VitalPBX Service in the Primary Server, using the following commands.

Create the Fail2Ban Service in the Primary Server, using the following commands.

Initialize the Corosync and Pacemaker services in the Secondary Server with the following commands.

Note: All configurations are stored in the /var/lib/pacemaker/cib/cib.xml file.

Now let’s see the cluster status by running the following command in the Primary Server.

This command will return the following.

Note: Before doing any high availability testing, make sure that the data has finished synchronizing. To do this, use the cat /proc/drbd command.

With our cluster configured, we now must configure the bind address. Managing the bind address is critical when using multiple IP addresses on the same NIC (Network Interface Card).
This is our case when using a Floating IP address in this HA cluster. In this circumstance, Asterisk has a habit of listening for SIP/IAX on the virtual IP address, but replying on the base address of the NIC, causing phones and trunks to fail to register.

In the Primary Server, go to Settings > Technology Settings >PJSIP Settings, and configure
the Floating IP address in the Bind and TLS Bind fields.

Now that the Bind address is set. We will create the Bascul command in both servers. This command will allow us to easily move the services between nodes. This will essentially allow us to move between the Primary and Secondary Servers.

To start creating the bascul command, we can begin by downloading the following file using wget on both servers.

Or we can create it from scratch using nano on both servers.

And add the following content.

Save and Exit nano. Next, add permissions and move to the /usr/local/bin directory using the following commands in both servers.

Now, create the Role command in both servers. You can download the following file using wget.

Or you can create the file using nano on both servers.

Add the following content.

Save and Exit nano. Next, we copy it to the /etc/profile.d/ and permissions directory using the following commands on both servers.

Now, add execution permissions and move to the /usr/local/bin directory using the following commands on both servers.

Afterward, we will create the drbdsplit command in both servers. Split-Brain can be caused by intervention by cluster management software or human error during a period of failure for network links between cluster nodes, causing both nodes to switch to the primary role while disconnected. Split-brain occurs when both high availability nodes switch into the primary role while disconnected. This behavior can allow data to be modified on either node without being replicated on the peer, leading to two diverging sets of data on each node, which can be difficult to merge. The drbdsplit command allows us to recover from split-brain in case it ever happens to us. To create the drbdsplit command, we can download the following file using the wget command on both servers.

Or we can create it from scratch using nano on both servers.

Add the following content.

Save and Exit nano. Now, add permissions and move it to the /usr/local/bin directory using the following commands on both servers.

With this, you have a full high availability environment! CONGRATULATIONS, you now have high availability with VitalPBX 4.

What are your feelings
Updated on December 5, 2023