Why We Use JavaScript to Handle Call Processing Rather Than XML

DZone 's Guide to

Why We Use JavaScript to Handle Call Processing Rather Than XML

For developers with complex scenarios JavaScript-based handling enables the more powerful and flexible platform that these developers need in order to build the solutions they envision.

· Web Dev Zone ·
Free Resource

The rise of cloud communication platforms utilizing WebRTC to provide voice and video call capabilities (including our own) has led to a diversity of methods that developers can use when handling call processing. It’s a testament to the power of WebRTC that it has succeeded in offering solutions to the complex issues of client-side media processing, while also providing the open source technology and standard that so many players in this space base their products upon.

While each of these products offers a different experience for developers, the majority are really quite similar in their approach, achieving call processing by relying on one form or another of XML. For example, if you’re using Twilio you might write specific XML in that platform’s own TwiML, if you’re using Plivo you write Plivo XML, etc., or you write code that will create the specific flavor of XML you need. After this, your chosen solution’s backend will communicate with your backend when a call needs to be processed.

These XML-based methods do function, and are a requisite when using many popular solutions. However, in my opinion these methods include significant downsides, especially from a developer perspective. They produce a great quantity of unneeded interactions between the developer’s web service and the platform’s backend. If an issue does occur, it’s challenging to successfully process the call or even shut it down in a graceful manner. The act of creating the needed XML itself is inelegant for developers (although it’s fair to say that wrappers do exist to make this easier). These methods also mean that debugging is a hassle, requiring developers to check request logs to uncover issues. They are also pretty darn inflexible, making it challenging for developers who may need to adapt solutions to complex scenarios. XML-based call processing is also noticeably resource-intensive at the web service level. Put all together, these constrictions placed on developers mean that products utilizing XML for call processing see a longer road in getting to market.

While creating Voximplant, we intentionally took a different approach to call processing, using a cloud application engine to ensure a greater level of flexibility for developers. For our engine, named VoxEngine, we chose for our call control scenarios to be written in JavaScript because it is such a popular language familiar to developers.

Development for VoxEngine is reminiscent of Node.JS, but distinct in that a session is begun when a call reaches the platform. The developer sets up controls for what happens to the call within the session, such as forwarding it, enabling call recording, requesting data from an external web service via http, etc. Otherwise, scenarios in VoxEngine look like standard JavaScript apps. They have event handlers (with events fired asynchronously), ECMA5 functions, and VoxEngine-specific classes and functions to help developers better control calls and ensure the platform’s features, such as recording, conferencing and many others.

This JavaScript-based method of handling call procession gives developers the flexibility to introduce any needed business logic to their various call control scenarios. Each VoxEngine session does, of course, have memory and resource limits when it comes to processing (just like any solution), but the ease of implementing new call capabilities can easily reduce the time and effort required during development.

The fact that call processing occurs in real-time on the media server and is controlled from the app engine creates some interesting possibilities.

For example, it’s possible to use a real-time VoxEngine debugger, much like a JavaScript debugger, which can be used inside of any modern web browser.

VoxEngine is also structured so that each call object is a connection between the platform and either a phone, SIP, or SDK, and each of those connections can be separately and independently controlled. This makes it uncomplicated to connect any configuration of incoming and outgoing calls as needed for a given situation, with each separately controlled connection joining the call session as appropriate. Any number of outbound calls is allowed, and a call object’s inbound and outbound audio streams can also be controlled separately.

This level of flexibility caters to the needs developers have when implementing web and app-based voice or video call solutions, especially for those developers dealing with more interesting scenarios that just aren’t possible to execute using the standard communications platforms with XML-based call process handling. For developers with complex scenarios – say, chained audio conferences for walkie-talkie type functionality – JavaScript-based handling enables the more powerful and flexible platform that these developers need in order to build the solutions they envision.

cloud, javascript, web dev, xml

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}