One of the historic challenges of video conferencing that we have conquered at Vidyo is that the original design of IP networks was never intended as a vehicle for carrying audio and video between people who want to communicate in real-time. IP networks were originally designed as a means to transmit data between computer systems.
Even modern IP networks, and especially mobile ones, introduce packet loss and jitter. For typical data-driven applications, such as email or web browsing, these network impairments don’t typically manifest themselves to users. In real-time interactive communication applications, however, it is very apparent. Network hiccups often show themselves as a broken picture and broken audio.
Video Compression Basics
Video is highly compressed to conserve bandwidth, and as a result, is not resilient to additional loss of information. More specifically, all modern video compression techniques are predictive, meaning that most frames are compressed, or encoded, using information from previously encoded frames. Therefore, the loss will not only impact the current frame, but subsequent frames going forward. Packet loss, therefore, has a noticeable impact on video quality.
Forward Error Correction and Retransmission
Mechanisms have been introduced to decrease the amount of data that is lost. Forward Error Correction (FEC) duplicates information across several packets, thus increasing the probability of it being delivered. One downside of FEC is that it increases the size of the already large video stream by padding it with information that might never be required. To make matters worse, in many cases, packet loss is caused by congestion and sending additional data unnecessarily will increase congestion, and in turn, increase the packet loss even further. To overcome this shortcoming, retransmission of specific information can be made only on demand. This ensures that only packets lost in transmission are transmitted more than once. However, retransmission incurs a round-trip delay to recover the lost packet. Such delays may impact the real-time behavior of the video stream in interactive video applications.
Streaming and High Delay Applications
Some video applications, such as one-way streaming, can accommodate delay. When a viewer is watching a stream, there is no interactivity with a remote person, so if there is additional delay, it will never really be perceived by the viewer of the stream. In these cases, error resiliency can be achieved efficiently by buffering the video stream prior to playing it back. This buffer helps in normalizing playback of the video that may be affected by varying network conditions. As a result of this playback buffer, when lost packets are encountered, a retransmission request can be sent and the lost information can be retrieved before playback. Therefore, given a one-way streaming application’s delay tolerance, retransmission can be an effective means for handling packet loss. One-way streaming applications are often easier when dealing with problematic networks due to the more forgiving nature of the use case.
Interactive and Low Delay Applications
High delay cannot be tolerated in interactive video applications. When interactive video sessions encounter high delay, the participants experience long periods of time between the end of a spoken statement and the response. This makes the conversation seem very unnatural.
To preserve interactivity, playback must begin immediately after the video stream is received and decoded. When the decoder encounters a lost packet, it requests a retransmission and is then faced with two equally disruptive options – it can either halt playback, or display severely degraded (or “broken”) images until the missing information is received. Traditional video conferencing applications exhibit these distracting behaviors in the presence of packet loss, or otherwise fall back to introducing delay.
Using Scalable Video Coding
Vidyo pioneered the use of Scalable Vidyo Coding (SVC) in interactive video applications and holds many patents in on its use. Vidyo leverages Scalable Video Coding’s unique properties to enable powerful error resiliency to overcome packet loss while minimizing the impact on both interactivity and video quality. Scalable video coding works by encoding the resulting video stream in a series of layers with a base layer and one or more enhancement layers. Each layer enhances the video by adding spatial resolution and frame rate to the base layer resulting in a higher quality video. This video layering information, combined with an intelligent server in the middle, can dynamically adapt the transmitted video’s resolution and frame rate into an optimal video stream for the present network conditions.
Another important difference between SVC and traditional video coding is with the dependencies within encoded video stream. In traditional coding, a frame is predicted from the frame directly proceeding it in time whereas in Scalable Video Coding a more sophisticated predictive structure can be used and frames can be predicted from other frames from different points in time or with different resolutions. This results in an overall video stream that is much less susceptible to broken picture when packet loss is encountered. These characteristics of SVC are utilized by Vidyo’s patented technology to solve the problem of packet loss.
Testing Error Resiliency
So how does one compare different platforms that claim to be error resilient? The key is to perform a qualitative assessment of the video experience in various network conditions. You have to be sure that users are still able to comfortably communicate even when experiencing bad network conditions that are likely to be encountered in real-world networks.
When testing an interactive video communication system for error resiliency it is important to pay attention to all the factors that will affect the system’s usability. Freezes and broken images are relatively straightforward to observe and typically attract the most attention. Delay, which is just as disruptive, is sometimes overlooked which could lead to inaccurate conclusions. In other words, simply playing a video loop into a call will not provide an accurate assessment of the user experience. The best way to make sure you are performing a thorough and complete test is to have users conduct an interactive discussion while poor network conditions are introduced. This will quickly highlight any delay that can be introduced by the error correction mechanisms. When the delay reaches high levels, the user experience will rapidly degrade and users will no longer want to use the video chat solution.
It is a difficult balance maintaining a consistently high-quality picture while keeping overall latency low. But we feel the best user experience exists when people don’t notice they are using technology to communicate. It is extremely important for users to engage in conversational video and forget that the person they are speaking with isn’t sitting next to them. So, before you choose a platform for your next project, be sure to thoroughly assess the video quality and interactivity in bad network scenarios.