WebRTC 102: Understanding SDP Internals
In this article, we'll explore SDP, its meaning, and how it works in WebRTC as well as offer tips and best practices for working with it.
Join the DZone community and get the full member experience.
Join For FreeAs a WebRTC developer, you've probably heard the term "SDP" thrown around quite a bit, but what exactly is SDP and why is it important in WebRTC? In this article, we'll explore SDP — its meaning and how it works in WebRTC, and offer tips and best practices for working with it.
Let’s dive in!
What Is SDP and Why Is It Important in WebRTC?
The communications protocol known as SDP, or Session Description Protocol
, is used to negotiate the specifics of a real-time communication session between two devices or endpoints. SDP is used in WebRTC to negotiate the session's media parameters and to describe each device's media capabilities. To put it another way, SDP is the language that WebRTC devices speak to one another.
It facilitates real-time communication between devices with different capabilities or being positioned behind firewalls or NATs, making it an essential part of WebRTC. Real-time communication would not be possible if WebRTC devices could not negotiate the specifics of a communication session.
How Does SDP Work in WebRTC?
SDP messages are structured as a series of key-value pairs, with each pair representing a specific aspect of the session. The SDP message is typically sent as part of the WebRTC signaling process, which is used to establish a connection between two devices. The SDP negotiation process typically involves two steps: an offer and an answer.
During the offer phase, one WebRTC client sends an SDP message to the other WebRTC client, describing its media capabilities and further session details. The other WebRTC client then responds with its own SDP message as an answer, describing its capabilities and session details. The two WebRTC clients then compare the SDP messages and agree on a set of acceptable media parameters for both clients.
Once the SDP negotiation process is complete, the two devices can begin to stream media between them using the agreed-upon parameters.
This process can be complex, especially when dealing with multiple devices or networks. However, it is essential for establishing a successful WebRTC communication session.
Common SDP Attributes
SDP messages contain a variety of attributes that describe the media capabilities and other session details of a WebRTC device. Some of the most common SDP attributes include:
- Version: The version of the SDP protocol being used
- Origin: The originator of the SDP message, including the username, session ID, and network address
- Session name: A human-readable name for the session
- Media descriptions: Descriptions of the media streams being offered or answered, including the media type, codecs, and transport protocols
- Connection data: Information about the network addresses and ports being used for communication
- Timing: Information about the timing of the session, including start and end times
- Encryption: Information about any encryption mechanisms being used to secure the session
Session Description
The session description provides an overall description of the multimedia session. It includes information such as the session name, the session timing, and the connection information; for example:
v=0
o=- 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0
Here, the keys mean the following:
v=
: Protocol versiono=
: Originator and session identifiers=
: Session namec=
*: Connection information -- not required if included in all media descriptionst=
: Time the session is active
Media Description
The media description provides specific information about the media that will be exchanged during the session. It describes the media type, the codecs used, and the transport protocol used,; for example:
m=audio 4000 RTP/AVP 111
a=rtpmap:111 OPUS/48000/2
m=video 4000 RTP/AVP 96
a=rtpmap:96 VP8/90000
m=
: Media name and transport addressa=
*: Zero or more media attribute lines
Attributes
Attributes provide additional information about the multimedia session. They can include information about the media bandwidth, the network addresses and ports used, and the media encryption.
Here is a summary of some typical characteristics that you will see in a WebRTC Agent's Session Description. Many of these parameters regulate the unrecognized subsystems.
group:BUNDLE
This line is followed by multiple mids of media available in SDP and is used for sending various media over a single UDP/TCP connection. It is generally suggested to use bundling in WebRTC.
fingerprint:sha-256
This line contains information about the hash of the certificated exchanged during the DTLS handshake.
a=setup
This controls the DTLS agent after ICE is connected, this value determines if DTLS should run as a client or server. There are three possible values:
setup:active
- DTLS agent will run as a clientsetup:passive
- DTLS agent will run as a serversetup:actpass
- The DTLS agent will let other WebRTC peers decide what to use.
ice-ufrag
, ice-pwd
and ice-options
These are ICE-related configurations. ice-ufrag
defines the username fragment and ice-pwd
holds the password for ICE authentication, whereas ice-options
tell if the ICE gathering should be trickled or renominated.
extmap
This defines the available header extension to send or receive an offer or answer respectively for peer connection.
msid
This is only for telling the other party what stream ID and track one is sending. The format is ${streamid} ${trackid}
.
rtpmap
A particular codec is mapped to an RTP Payload Type using this value. Because payload types are not fixed, the offerer chooses the payload types for each codec for each call.
rtcp-fb
This is present in SDP in the media section. It should not be included in the session section of SDP. rtcp-fb
declares which RTCP Feedback messages should be used for a given payload type of media section.
ssrc
This stands for Synchronization Source. It’s a 32-bit random value that denotes sending media for a specific source in an RTP connection. The format is a=ssrc:<ssrc-id> cname: <cname-id>
.
These are the important attributes that tell us a lot about the media being negotiated and used for a session. I hope you have understood how to read SDP and its components.
Now, we will discuss practical usages of SDP that improve the WebRTC experience such as Simulcast, Perfect Negotiation, and Renegotiation.
Simulcast
Simulcast is an advanced concept in WebRTC that drastically improves the whole media experience. It enables sending the same video stream at multiple resolutions and bitrates and selecting the most suitable stream by the receiver based on their available bandwidth and device capabilities through SDP.
To use simulcast the specs, introduce a few additional attributes on SDP. These are a=simulcast
, a=rid
, and an additional header extension map attribute a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
.
An example of an SDP offer using simulcast looks like this:
m=video 49300 RTP/AVP 97 98 99
a=rtpmap:97 H264/90000
a=rtpmap:98 H264/90000
a=rtpmap:99 VP8/90000
a=fmtp:97 profile-level-id=42c01f;max-fs=3600;max-mbps=108000
a=fmtp:98 profile-level-id=42c00b;max-fs=240;max-mbps=3600
a=fmtp:99 max-fs=240; max-fr=30
a=rid:1 send pt=97;max-width=1280;max-height=720
a=rid:2 send pt=98;max-width=320;max-height=180
a=rid:3 send pt=99;max-width=320;max-height=180
a=rid:4 recv pt=97
a=simulcast:send 1;2,3 recv 4
a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=simulcast
This attribute describes, independently for "send" and "receive" directions, the number of simulcast RTP streams as well as potential alternative formats for each simulcast RTP stream. Each simulcast RTP stream, including alternatives, is identified using the RID identifier (rid-id), defined in [RFC8851].
a=simulcast:send 1;2,3 recv 4
The "send" element of this line, if it is present in an SDP offer, denotes the offerer's capacity and proposal to send two simulcast RTP streams. Each simulcast stream has one or more RTP stream IDs (rid-ids), with a semicolon between each group of rid-ids for the stream (";"). Several rid-ids separated by commas (",") in a simulcast stream indicate different representations for that same simulcast RTP stream. As a result, the "send" portion of the above code is taken to mean that two simulcast RTP streams are intended to be sent. Rid-id 1 is used to identify and limit the first simulcast RTP stream. Two possibilities for the second simulcast RTP stream can be delivered, identified, and limited by rid-ids 2 and 3. The offerer wishes to receive a single RTP stream (no simulcast) in accordance with rid-id 4 as indicated by the "recv" portion of the line displayed above.
This SDP offer's recipient can produce an SDP answer indicating what it accepts. It indicates simulcast capabilities and specifies which simulcast RTP streams and alternatives to receive and/or send using the "a=simulcast" element. According to the above offer, an illustration of such a responding "a=simulcast" attribute is a=simulcast:recv 1;2 send 4
.
With this SDP response, the answerer expresses their desire to receive the two simulcast RTP streams in the "recv" section, having eliminated a substitute that it does not support (rid-id 3). According to rid-id 4, the "send" component assures the offerer that they will receive one stream for this media source.
Legacy Simulcast
Legacy simulcast is nothing but the old way to do simulcast which Firefox does. It uses explicitly defined ssrc
and ssrc-group
attributes in SDP along with rid
attributes.
An example of an SDP offer generated in the Firefox browser with simulcast enabled is as follows:
a=simulcast:send r1;r0 a=ssrc:4264196019 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc:2642934809 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc:764299737 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc:3939469720 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc-group:FID 4264196019 2642934809
a=ssrc-group:FID 764299737 3939469720
You already understand what ssrc
implies on SDP. So let me tell you what ssrc-group
means.
The attribute ssrc-group
defines a relationship among several ssrc
s of an RTP session. ssrc-group
is always followed by a list of ssrc-id
and it can be at least one or more. A similar ssrc
line should exist on the SDP message for ssrc-id
defined in ssrc-group
. The semantic values defined for ssrc-group attributes are FID
which stands for Flow Identification
and FEC
which stands for Forward Error Correction
.
What basically this means for you is if you get multiple ssrc
s and ssrc-group
s in your offer, then the answerer peer connection must understand that the sender will send RTP packets on defined ssrc
s only.
Perfect Negotiation
From MDN docs, perfect negotiation is a set of processes where you avoid the collision of the SDP offer being sent from both sides at the same time.
Each of the two peers in a perfect negotiation is given a role to play in the negotiation process that is fully independent of the state of the WebRTC connection:
- A considerate peer is one who avoids collisions with inbound offers by using ICE rollback. In essence, a polite peer is one who makes offers, but when another peer makes one, the courteous peer says, "Well, never mind, drop my offer and I'll consider yours instead."
- A rude peer is one who never accepts proposals that compete with those it already has. It never offers anything up or makes an apology to the polite peer. When two unfriendly peers collide, the rude peer always prevails.
This way, if transmitted offers collide, both peers know what exactly should happen. Error-related reactions become much more predictable.
We won’t go too deep into the implementation of Perfect Negotiation in this post but will discuss the important components that help us to achieve perfect negotiation.
pc
means a peer connection which is obtained by doing const pc = new RTCPeerConnection(config)
where config is valid RTCPeerConnectionOptions.
First, we need to add a handler on pc.onnegotiationneeded
. The handler needs to do pc.setLocalDescription()
without first generating the offer because pc.setLocalDescription
takes the current state and generates the offer if required which solves one problem of unnecessarily generating multiple SDP offers for a peer connection. Then we can send the pc.localDescription
to the remote peer. The whole handler looks something like this:
let makingOffer = false;
pc.onnegotiationneeded = async () => { try { makingOffer = true; await pc.setLocalDescription(); signaler.send({ description: pc.localDescription }); } catch (err) { console.error(err); } finally { makingOffer = false; }
};
Second, we need to add a handler on pc.onicecandidates
. This event gets emitted once you do pc.setLocalDescription()
. The parameters of this event are the list of ICE candidates that the ICE gathered for this pc
. Once you get this list of candidates, you need to send them to the remote peer.
Third, we have to handle incoming remote offer SDP or ICE candidates from the remote peers. We need to check if the incoming offer is colliding due to the local peer in the process of generating the offer or if the local peer’s state is not stable. If the offer is colliding and its impolite peer just returns from the handler because the impolite peer doesn’t respect the incoming offer in the colliding state. Otherwise, do pc.setRemoteDescription(offer)
, and if the incoming message is offered then you just need to do pc.setLocalDescription()
without a parameter so that it will automatically generate an answer for you and set it in the local description. Then you just send pc.localDescription
to a remote peer and voila- your perfect negotiation is done. In code, you can write this as:
let ignoreOffer = false;
signaler.onmessage = async ({ data: { description, candidate } }) => { try { if (description) { const offerCollision = description.type === "offer" && (makingOffer || pc.signalingState !== "stable");
ignoreOffer = !polite && offerCollision; if (ignoreOffer) { return; }
await pc.setRemoteDescription(description); if (description.type === "offer") { await pc.setLocalDescription(); signaler.send({ description: pc.localDescription }); } } else if (candidate) { try { await pc.addIceCandidate(candidate); } catch (err) { if (!ignoreOffer) { throw err; } } } } catch (err) { console.error(err); }
};
We understand the whole process sounds super complex, but once you implement this in your application, you won’t have to worry about SDP collision and can focus on other parts.
Debugging SDP Issues
When you get your hands dirty with SDP, you must have some tools ready to help debug issues more efficiently. There are not many tools available around SDP, but a few SDP parsers are available that you can use to make SDP string readable. Some are:
There are also a bunch of SDP parser libraries for a few languages like JavaScript and Go.
Best Practices for Working With SDP
Working with SDP in WebRTC can be complex, but following best practices can help you optimize your implementation and achieve better performance. Some tips for working with SDP in WebRTC include:
- Keep SDP messages as small as possible: Large SDP messages can slow down the negotiation process and reduce overall performance. Keep your SDP messages as small as possible by only including the necessary attributes.
- Use a signaling server: A signaling server can help mediate the SDP negotiation process and ensure that both devices agree on the same media parameters. A signaling server can also help ensure your WebRTC implementation is secure.
- Test your implementation across multiple devices and networks: Testing your WebRTC implementation across numerous devices and networks can help ensure that it is interoperable and can work in various environments.
- Use a library or framework: A WebRTC library or framework can help simplify the SDP negotiation process and reduce the risk of errors.
These best practices can help you build a more reliable and performant WebRTC implementation.
Conclusion
SDP is a crucial component of WebRTC, enabling real-time communication between devices that may have different capabilities or are located behind firewalls or NATs. Understanding how SDP works in WebRTC and following best practices can help you build a more reliable and performant WebRTC implementation.
By following the tips and best practices outlined in this article, you can optimize your WebRTC implementation and ensure that it works across a variety of browsers and platforms.
I hope you found this post informative and engaging. If you have any thoughts or feedback, please get in touch with me on LinkedIn. Stay tuned for more related blog posts in the future!
Published at DZone with permission of Manish ... See the original article here.
Opinions expressed by DZone contributors are their own.
Comments