The rise of cloud communication platforms utilizing WebRTC to provide voice and video call capabilities (including our own) has led to a diversity of methods that developers can use when handling call processing. It’s a testament to the power of WebRTC that it has succeeded in offering solutions to the complex issues of client-side media processing, while also providing the open source technology and standard that so many players in this space base their products upon.
While each of these products offers a different experience for developers, the majority are really quite similar in their approach, achieving call processing by relying on one form or another of XML. For example, if you’re using Twilio you might write specific XML in that platform’s own TwiML, if you’re using Plivo you write Plivo XML, etc., or you write code that will create the specific flavor of XML you need. After this, your chosen solution’s backend will communicate with your backend when a call needs to be processed.
These XML-based methods do function, and are a requisite when using many popular solutions. However, in my opinion these methods include significant downsides, especially from a developer perspective. They produce a great quantity of unneeded interactions between the developer’s web service and the platform’s backend. If an issue does occur, it’s challenging to successfully process the call or even shut it down in a graceful manner. The act of creating the needed XML itself is inelegant for developers (although it’s fair to say that wrappers do exist to make this easier). These methods also mean that debugging is a hassle, requiring developers to check request logs to uncover issues. They are also pretty darn inflexible, making it challenging for developers who may need to adapt solutions to complex scenarios. XML-based call processing is also noticeably resource-intensive at the web service level. Put all together, these constrictions placed on developers mean that products utilizing XML for call processing see a longer road in getting to market.
The fact that call processing occurs in real-time on the media server and is controlled from the app engine creates some interesting possibilities.
VoxEngine is also structured so that each call object is a connection between the platform and either a phone, SIP, or SDK, and each of those connections can be separately and independently controlled. This makes it uncomplicated to connect any configuration of incoming and outgoing calls as needed for a given situation, with each separately controlled connection joining the call session as appropriate. Any number of outbound calls is allowed, and a call object’s inbound and outbound audio streams can also be controlled separately.