Predictable Pitfalls of Scaling VoIP to Cluster
Various architecture options and how it all works under the hood, with a few real-life configuration examples included for good measure.
Join the DZone community and get the full member experience.Join For Free
VoIP technologies have a reputation for being rather complex and not without good reason. This is not to say, however, that any mid-level developer will have much difficulty finding and handling one of many available open-source servers. The only problem is the lack of information and real-world examples, as VoIP server homepages rarely have configurations that go beyond the very simplest option, and even when they do, they may be not up-to-date.
You may also like: Application Scalability — How To Do Efficient Scaling
Having realized over a dozen complex VoIP projects, I’ll try and summarize the key principles and components behind a fault-tolerant scalable VoIP system. The focus of this article is on various architecture options and how it all works under the hood, with a few real-life configuration examples included for good measure.
VoIP Server Introduction
If names like FreeSWITCH, Asterisk, SIP, RTP, WebRTC are not just gobbledygook for you, feel free to safely skip this part.
The world of VoIP stands on two main pillars: SIP and RTP. These are two protocols that were developed at the end of the last century, coming to us from the world of telephony. SIP (Session Initiation Protocol) is a command protocol that is responsible for the choice of codecs, start/end of a call, call control (hold/transfer/etc.) and has a huge number of extensions, including text messages sent, notification of left voice messages, etc.
IP phones from various manufacturers can support their own set of extensions. For example, BLF (Busy Lamp Field) is very common — a field of LEDs on a desk phone, which can be configured so that they show another employee’s phone status.
RTP (Real-time Transport Protocol) is a media protocol responsible for the transmission of audio and video data. It uses various codecs for media data encoding.
Both protocols are implemented by default over UDP. Almost everyone has TCP as an option, but UDP is better for several reasons. Encryption is available for both protocols.
The VoIP architecture is based on the idea that SIP and RTP servers are two different servers, so there are server implementations that support only SIP or only RTP. However, FreeSWITCH and Asterisk are open source servers that support both of these protocols, as well as several others. The choice between them largely depends on personal preference, requirements and integration tasks, but both of them allow you to get an office PBX out of the box.
Last but not least is WebRTC (Web Real-Time Communication). It is the de-facto standard for voice calls on the web. WebRTC uses SRTP (secure RTP), leaving the implementation of the command layer at the mercy of the JS code. For a p2p call, all you need to do is give your address and call parameters to the other side, which can be done based on any protocol. If you require integration with an SIP server, ‘SIP over WebSocket’ protocol is normally used. Implementation — sipjs.com.
FreeSWITCH supports both the SIP over WebSocket and its alternative protocol, implemented by the mod_vertoo module, designed specifically for integration with WebRTC that solves any possible unpleasantness resulting from WebRTC and SIP incompatibility, which would otherwise lead to a 1-5 second delay before a call. There is more than one way to solve these problems on the JS side, but that is a topic for a separate article.
Just as with regular analog calls, VoIP requires a PBX so that users can find each other by a certain phone number, and overcome the problem caused by the lack of a white IP from a client. After the clients have found each other and agreed on the codecs, it is necessary to establish a connection for a media stream. Depending on the requirements of the project and network configuration, the data stream can go either directly from the client to the client, or through a server.
The VoIP world usually tends to forward traffic through a server. This solves the problem of having a lack of a white IP for clients and allows any conversation to be recorded centrally.
WebRTC offers a direct connection between clients by default, but this does not work in the case of asymmetric firewalls when both clients do not have a white IP. In these cases, a STUN/TURN media proxy server is required. For commercial projects, you will have to install and pay for your proxy server for your WebRTC based application to work. Browsers provide only STUN server services for free (recognition of an external IP client), but the traffic proxying (TURN) is a paid feature.
Note: There is a technology that at least on paper enables a connection to two clients without white IPs with a little server help, but in our experience that hasn’t happened. If you’re interested, try googling ‘symmetric NAT’.
SIP Load Balancing With OpenSIPs
FreeSWITCH and Asterisk allow you to process a vast number of simultaneous calls on the same machine, from 500 to 1000. But what if we need more? Or if there’s a need for a cloud-based load balancing?
That’s where the SIP load balancer comes to the rescue and here’s how it works. One machine is designated as the balancer, which at the SIP level selects an end node and redirects all processing of media traffic to it. RTP media traffic does not pass through the balancer, thus relieving the balancer which is now able to withstand a significantly larger number of simultaneous calls. As soon as the call is created, the load effectively disappears.
The choice for a load balancer usually falls between OpenSIPs and Kamailio. Both projects have a common ancestor and a similar module structure. We will leave the comparison and choice of a specific solution for a separate article and for now focus on OpenSIPs.
Many have already had some experience with FreeSWITCH and/or Asterisk, there are enough how-to examples on the internet, but setting up an SIP balancer in OpenSIPs or Kamailio is completely different.
FreeSWITCH and Asterisk only require a user list and fairly simple call routing rules. Moreover, there are free admin panels that allow you to configure everything in a web interface.
In contrast, OpenSIPs and Kamailio require the user to understand the SIP protocol at least at the beginner level and manually prescribe what to do with each SIP message.
The OpenSIPs configuration is a program in a high-level pseudo-language, which has to link 10+ different modules into a single system. The list of modules is much larger, but parts of it duplicate each other, and may not all be needed.
In the program, you specify how to process each incoming SIP message and what to send in response. However, the main workload is assigned to plug-ins, and you just need to configure a large switch, when and which module to call.
If you want to add a lot of custom logic, it may be easier at some point to move the business logic to FreeSWITCH (Asterisk) and/or a separate server, which will manage the routing of calls, queues, etc.
The simplest load balancing script is as follows:
- If the incoming SIP message is not the first, and it is already clear where to route it, you can transfer control to the dialog module.
- Otherwise, call the balancer module to search for a node with a lighter load, but its address in target and transfer control to the dialog module.
- Error processing:
- The selected node can return an error code (similar to HTTP codes), and you need to specify what to do in each case - drop the call or try to choose another node.
- The balancer module may return an error and say that there are no free nodes.
Note: When you write processing programs for OpenSIPs, remember that you must process both messages coming from outside to your cloud and messages coming from your cloud to the outside. That is why the program is often divided into 2 or 3 huge ‘If’-blocks, depending on the direction of the SIP package.
For reasons of brevity, we cut out all error handling. The full version can be found here.
Features of OpenSIPs:
- Header values do not change during program execution. If you assign something and then try to output to the log, the initial value will be displayed.
- There are 2 different types of user variables and they are incompatible. Some of the modules use the first type, others use the second.
- Although many functions do not return anything, they often do something useful, and you will only find out what exactly by reading the documentation in its entirety and sometimes the source code.
- For example, load_balance () puts the value of a variable $ du. The record_route () saves $ du in the message header, but it is called earlier load_balance (). A clear violation of the causal relationship. We assume that the headers are considered at the very end, and record_route () is more likely to put a certain flag or link.
- Stateless paradigm. SIP allows you to store all the data necessary for routing directly in the message headers. The client, however, is obliged to copy them when responding, which allows the server to be stateless. Nevertheless, if necessary, you can save a routing table in the database without notifying all and sundry about the internal structure of your cluster, but this will add to the load on the database and cause delays. The choice is yours.
VoIP Failover and Scaling
Fault tolerance and scaling are closely related. It is impossible to conjure up a fault tolerance architecture without considering the scaling scheme. All you can do is implement fault tolerance in the simplest version. For starters, a bit of theory.
At the protocol level, VoIP supports call recovery even when a server crashes.
- SIP failure. By default, SIP uses UDP, which, unlike TCP, does not establish a permanent connection. But the client is re-registered approximately every minute (can be configured). A new server can respond to each packet from the client. The only limitation is that the IP address of the response packet should not change, and we need to store the state of the user's calls somewhere (if they exist at the moment) so that any server can work with this data.
- RTP failure. If for some reason your RTP server, which also uses the UDP protocol, has failed, you cannot easily restore the media stream, but you can send a reinvite command through the SIP protocol. Ideally, the client will only notice a few seconds of audio loss. The implementation of fall detection remains a question. There is no box solution at the time of writing.
Let's consider different architecture options.
Implementation details vary greatly depending on the server used, only one thing remains unchanged: the need for a Virtual (floating) IP, which we can dynamically reassign to another server.
The problem is solved at the level of shell scripts, which are used to check whether the server is working, and if not, to reassign the IP to another one. The main problem is the balance between false positives and the long wait before switching.
But that's not all. The SIP server stores information about connected clients, and in order not to lose it, you must configure your server to store this information in a distributed database. Then in case of restart or switching to the slave, we will not lose any data. Almost all popular SIP servers support the use of box databases.
Properly implemented scaling inevitably solves the problem of fault tolerance. One server died – just take another.
The main problem is that the SIP client ignores packets sent from another IP. Here’s an example of What WON’T work:
Client B will simply ignore the incoming call (invite) because its source IP is different from the server on which the client is registered.
What WILL work?
Depending on the specific project, available hardware, and capabilities, there are three possible solutions to the problem.
Override Source IP
The easiest option is to install an even simpler (and faster) load balancer, which will replace the source IP with your own for all outgoing packets. Hide all the SIP servers behind it. We need to configure all SIP servers to use a common database.
The obvious benefit here is the ease of setup.
- Single Point of Failure as a load balancer.
- There may not be sufficient load balancer bandwidth.
Another option is to configure a SIP server farm with different IP addresses, DNS in round-robin mode. Clients randomly cling to one of many servers.
Program the SIP server to forward invite to the desired server (on which another client is registered) so that it sends a command to the client on its behalf.
Plus — everything scales remarkably.
A small minus — SIP register scales well, whereas the SIP invite and other commands related to the call scale are a little worse because in all likelihood they will pass through two nodes. That being said, ‘register’ does happen much far more often than ‘invite’.
If your application is SaaS for business, then it is very likely that clients of different companies do not need to call each other using a well-chosen hash function, and we can scatter many of our customers across different SIP servers.
You don’t even need to have a common database of SIP registrations. But you do need to somehow inform the SIP client of the address of the desired SIP server and figure out what to do if it dies (probably switch to another server). If we are talking about a UC solution, then, as a rule, the SIP client is built into the business application and receives all the settings from the server — so there is no problem.
The situation is worse if we use a purely SIP client. Then it makes sense to solve the problem at the DNS and/or routing level. For example, give each company its VoIP server address and dynamically change the DNS tables.
RTP Fail Over
In general, you need to track the RTP server crash and send reinvite to both clients.
This task is divided into several sub-items:
- Save the SIP dialog to the database.
- Track server death.
- Send reinvite for every conversation.
Implementation details vary greatly depending on the cluster configuration.
As an example: FreeSWITCH has a built-in ‘sofia recover’ command to restore calls after the server restart. But judging by the reviews, this command will not work in the case of a cluster of FreeSWITCH servers. Here you will probably need custom development.
An obvious way to scale media streams is to place them on a server separate from SIP traffic. Below we will discuss architecture options of varying complexity, from the simplest to the more complex ones.
OpenSIPs | Kamailio + RTP proxy
The SIP load balancer represented by OpenSIPs or Kamailio manages the RTP Proxy or its equivalent. Communication between them is carried out by a certain protocol (not SIP), within which the load balancer should be able to request the allocation of ports for the RTP protocol to send this data to clients. The client opens a new connection on the received IP and port, connecting to the RTP proxy.
All business logic is concentrated within the SIP load balancer. The solution is suitable if you have very little business logic, and all you need is to switch calls.
It is not even necessary to use an SIP server. One of our switching projects for 911 was implemented on Java: Call routing system
OpenSIPs | Kamailio + FreeSWITCH | Asterisk
SIP load balancer represented by OpenSIPs or Kamailio forwards SIP packets through itself to the selected FreeSWITCH | Asterisk server. The closest analogy is the HTTP load balancer with sticky sessions. OpenSIPs itself only handle registration.
When initializing a new dialog (call), it selects the least loaded FreeSWITCH node according to some strategy and redirects to it all SIP packets associated with this dialog. The FreeSWITCH itself, while processing client SIP packets, allocates an RTP port and sends a port number and its white IP through OpenSIPs to the client.
It makes sense to bring business logic to FreeSWITCH | Asterisk, which is much more suitable for this and has a large set of different modules out of the box.
This solution is better suited for systems with IVR, Voicemail, Call Center and other features of advanced PBX systems. (An example from my experience: Unified communications system)
OpenSIPs | Kamailio + FreeSWITCH + App server
The solution is in many respects similar to the previous one, but all the business logic is placed on the Application server. This approach eliminates the need to use a Lua script but adds a potential point of failure.
On the other hand, if you already have an Application server, and it is already a vital element of your Unified Communication solution, then why not use it? FreeSWITCH has an xml_curl module that allows you to download the entire configuration via HTTP from a third-party server. For each request, whether it is an authorization or an incoming call, FreeSWITCH first requests HTTP XML instructions and then executes them. The rest of the scheme runs along similar lines.
Opinions expressed by DZone contributors are their own.