WebRTC Stream Limits Investigation

Posted by Adam Rehn on 5 February 2021

This report presents an investigation into what limitations exist in the WebRTC protocol stack and the maximum number of concurrent media streams and data channels that can be established between WebRTC peers for bidirectional communication. The first section of the report discusses the theoretical limits dictated by the underlying WebRTC protocol stack, whilst the second section presents the methodology and results of empirical testing to determine the practical limits that exist in modern web browsers.

1. Theoretical limits
2. Empirical limit testing
References

1. Theoretical limits

1.1. WebRTC protocol stack

As illustrated in the WebRTC chapter of the excellent High Performance Browser Networking textbook ¹, the underlying network protocols used as the transports for WebRTC media streams and data channels are the Secure Real-Time Transport Protocol (SRTP) ² (which extends the Real-time Transport Protocol (RTP) ³) and the Stream Control Transmission Protocol (SCTP) ⁴, respectively. WebRTC also uses the Session Description Protocol (SDP) offer/answer mechanism for the SCTP protocol ⁵ to describe the requested media streams and data channels when negotiating connections with new peers. As a result, these protocols (and any subsequent extensions to these protocols) are what dictate the theoretical limits which exist for WebRTC media streams and data channels.

1.2. Maximum concurrent media streams

Both the WebRTC 1.0 specification ⁶ and the spec for RTP ³ state that a session may be subject to an aggregate bandwidth limit, but there is no specific mention of an upper bound on the number of concurrent media streams at the protocol level. Section 3.2.1 of the spec for using SDP with large numbers of media flows ⁷ suggests that multiple media streams within a single WebRTC connection can be transmitted over a single RTP transport so long as they are differentiated by unique RTP source identifiers. As per section 8 of the spec for RTP, each source in an RTP session is assigned an SSRC identifier represented by a random 32-bit integer ³ and collisions between these identifiers are resolved when encountered. Assuming all possible SSRC identifiers could meaningfully be used in a single session (which is unlikely in practice due to the overheads associated with repeated collision resolution), this would yield a theoretical limit of 4,294,967,295 identifiers.

Higher in the protocol stack, the WebRTC specification provides web browsers with a mechanism to specify their own limits on concurrent media streams. As per sections 4.4.1.5 and 5.1 of the WebRTC 1.0 specification ⁶, a browser may specify a limit on the maximum number of total simultaneous encodings for a given codec. A hard limit is specified when answering an SDP request from a remote peer (described in section 4.4.1.5), and an optimistic estimate is also provided prior to the codec being known when creating an SDP offer that includes simulcast encodings (described in section 5.1). These limits are far more likely to come into play than the theoretical maximums in the underlying transport protocols, although our results in the Empirical limit testing section demonstrate that real-world resource constraints often bottleneck browser implementations before any specified limits are reached.

1.3. Maximum concurrent data channels

WebRTC data channels are built on top of the SCTP protocol, and so a brief discussion of SCTP fundamentals is helpful when considering the theoretical limits of data channels. The SCTP protocol allows the establishment of an “association” between two endpoints, which is conceptually similar to a TCP connection but more flexible since both endpoints can use multiple network addresses. ⁴ Within an SCTP association, messages are transmitted across logical communication channels known “streams”, which are unidirectional in nature and facilitate both ordered and out-of-order delivery. Each stream is identified by a numerical identifier, which must be unique with respect to the stream’s direction but not across directions (i.e. there cannot be two outbound streams with stream identifier 0, but there can be both an inbound stream with stream identifier 0 and an outbound stream with stream identifier 0, and these two streams are completely independent of one another.) ⁴ Stream identifiers are represented by an unsigned 16-bit integer, but values are restricted to the range 0 through 65534, inclusive. This is because unsigned 16-bit integers are also used to represent the requested number of inbound and outbound streams during initial negotiation of an SCTP association between two endpoints, and so there can only exist a maximum 65535 inbound streams and 65535 outbound streams within a single association. ⁴ (Due to their zero-indexed nature, a stream identifier of 65535 would require the existence of 65536 total streams, and so that identifier value is reserved.)

WebRTC data channels are bidirectional communication constructs, and so each data channel is implemented as a pair of SCTP streams (one inbound stream and one outbound stream) that both have the same SCTP stream identifier. ⁸ As such, the theoretical maximum number of WebRTC data channels within a given peer connection is 65535, directly reflecting the maximum number of SCTP streams in each direction within the underlying SCTP association. However, this limit is actually influenced by the manner in which stream identifiers are negotiated between the peers. Data channels support both “in-band” negotiation using the SCTP association or “out-of-band” negotiation using an arbitrary process defined by the application. ⁸ When in-band negotiation is used, the negotiation protocol dictates the set of valid stream identifier values that can be used. To prevent peers from inadvertently attempting to open data channels which use the same stream identifiers, each peer must restrict itself to only odd or even numbers based on its DTLS role. The peer acting as the DTLS server must use only odd-numbered stream identifiers and the peer acting as the DTLS client must use only even-numbered stream identifiers. ⁹ This halves the number of data channels that a single peer can request (32767 for the peer using odd numbers and 32768 for the peer using even numbers), and so the full maximum of 65535 channels can be negotiated only if both peers coordinate to explicitly request their half of the available stream identifiers. Note that this odd/even stream identifier selection behaviour is also the recommended default for out-of-band negotiation if the chosen negotiation protocol used does not define an alternative. ⁸

Section 6.1 of the WebRTC 1.0 specification corroborates the theoretical maximum of 65535. ⁶ However, the MDN documentation for the RTCDataChannel interface ¹⁰ and this issue on the GitHub repo for the WebRTC API spec ¹¹ both point out that many real-world implementations of WebRTC actually support significantly fewer data channels in practice, precluding the use of the theoretical maximum by applications. Our results in the Empirical limit testing section confirm that this is indeed the case.

1.4. Maximum data channel message size

The base version of the underlying SCTP protocol used to implement data channels will start to suffer from congestion issues if individual messages exceed the Maximum Transmission Unit (MTU) size of the underlying network transport ¹², which will result in head-of-line blocking when sending large messages ¹³. Section 6.6 of the spec for WebRTC data channels recommends a maximum message size of 16KiB to safely avoid this problem. ⁸ In addition, large payloads which are split into multiple SCTP messages are (in the base protocol specification) indistinguishable from multiple smaller messages unless they are transmitted with contiguous sequence numbers, making it impossible to identify and reconstruct them when other messages are transmitted with interleaving sequence numbers. ¹² (Mozilla Firefox has traditionally worked around this by using a now-deprecated PPID fragmentation/reassembly mechanism ¹³, but this was a stop-gap measure which only works when using reliable ordered data channels ¹⁴ and was never adopted by any other web browsers.) There are key two efforts to address these limitations:

The SCTP Sockets API Extensions RFC ¹⁵ introduces an explicit end-of-record (EOR) flag to signal the end of a payload that has been split across multiple SCTP messages, allowing them to be reconstructed correctly without relying on contiguous sequence numbers. Mozilla Firefox implemented support for the EOR flag in Firefox 57 ¹²¹³ and Google Chrome in Chrome/Chromium 70 ¹⁶¹⁷.
The SCTP ndata specification ¹⁸ introduces support for interleaving SCTP messages across multiple streams, preventing head-of-line blocking due to congestion and allowing for payloads which are theoretically unlimited in size according to MDN. ¹² Unfortunately, support has not yet been implemented for the ndata specification in any of the major web browsers. As of February 2021, Mozilla Firefox and Google Chrome are both blocked waiting for the addition of full ndata support in the usrsctp library that provides their underlying SCTP implementation. ¹⁹²⁰²¹

Once all popular WebRTC implementations support both the EOR and ndata extensions then message size limits will no longer be an issue. Until then, most documentation still recommends a maximum message size of 16KiB in order to ensure interoperability between different implementations. ⁸¹² Since this limitation is well documented, we do not test maximum data channel message sizes in our Empirical limit testing below.

2. Empirical limit testing

2.1. Testing methodology

To test the practical upper bounds of concurrent WebRTC media streams and data channels in modern web browsers, we created a simple automated test harness which can be found here. The test harness runs a series of tests which steadily increase the number of data channels and then media streams until failure is detected. Each test performs the following steps:

Establish a WebRTC peer connection with the local browser over the network loopback interface
Negotiate the requested number of data channels and media streams
Transmit messages over the data channels and echo them back to the sender
Stream a local video file over the media streams (a 10-second clip from Big Buck Bunny ²², encoded at 1280x720 resolution with both the H.264 and VP9 video codecs)
Tear down the peer connection to ensure a clean slate for the next test

Although every test includes both data channels and media streams, the test suite is designed to isolate their effects on one another, using only a single media stream when testing for data channel limits and a single data channel when testing for media stream limits. Due to the possibility of browser crashes or freezes when testing large numbers of concurrent media streams, data channel limits are tested first and all progress messages are also transmitted to a local webserver so they can be stored on the filesystem and inspected upon test completion. The webserver also serves the test harness over TLS using self-signed certificates to ensure the same cryptographic code paths are triggered by the browser as would be the case when using WebRTC in a real application, since encryption is not mandatory when running over the loopback interface in Google Chrome but is required by Mozilla Firefox and is mandated for all remote origins by the WebRTC specification. ⁶

We performed tests on a machine running an AMD Ryzen 9 3950X CPU with 64GiB of system memory and an NVIDIA GeForce GTX 1080 graphics card, across multiple browsers and bare metal operating systems. The latest stable version of the proprietary NVIDIA GPU driver was used under each operating system. Each test was run in serial with no other applications running to ensure the full system resources were available to the web browser. Our results are presented in the section below.

2.2. Test results

When running our tests, we used the following configurations:

The latest version of Google Chrome under both Windows and Ubuntu Linux, using default settings.
The latest version of Google Chrome under both Windows and Ubuntu Linux, using the --no-sandbox command-line argument to disable the Chromium sandbox ²³.
The latest version of Mozilla Firefox under both Windows and Ubuntu Linux, using default settings.

All tests were run using both the H.264 and VP9 versions of the source video clip, but no differences were observed between the two codecs. The results of our tests are summarised below:

Operating system	Web browser	Configuration	Maximum data channels	Maximum media streams
Windows 10 version 20H2	Google Chrome version 87.0	Default	512	56
Windows 10 version 20H2	Google Chrome version 87.0	Sandbox disabled	512	122
Windows 10 version 20H2	Mozilla Firefox version 83.0	Default	128	65
Ubuntu Linux version 20.04.1	Google Chrome version 87.0	Default	512	16
Ubuntu Linux version 20.04.1	Google Chrome version 87.0	Sandbox disabled	512	17
Ubuntu Linux version 20.04.1	Mozilla Firefox version 83.0	Default	128	48

Table 1: Collected results from our empirical testing.

Under all configurations, the browsers failed gracefully when the maximum number of data channels was exceeded, allowing the test harness to identify the limit and continue on to test the maximum number of media streams. However, failure characteristics when testing large numbers of media streams varied depending on the configuration, and even across multiple runs of the test suite with the same configuration. Mozilla Firefox always failed gracefully when the maximum number of media streams is exceeded, simply closing the WebRTC peer connection. Google Chrome would sometimes crash with a sandbox out of memory error when the Chromium sandbox was disabled, but on other runs the browser tab containing the test harness would simply become unresponsive irrespective of whether the sandbox was enabled.

The largest disparity in observed results between different configurations specifically relates to the number of concurrent media streams that the browsers could handle prior to failing or becoming unresponsive. This number was slightly lower under Linux than under Windows for Mozilla Firefox, but significantly lower for Google Chrome, irrespective of whether the sandbox was enabled. It is likely that this large difference stems from a lack of support for hardware-accelerated video decoding under Linux, resulting in higher system resource usage. It is worth noting that although Mozilla Firefox introduced support for hardware-accelerated video decoding functionality in version 80 ²⁴, this feature is still under development at the time of writing and is not enabled by default. Google Chrome does not support hardware-accelerated video decoding under Linux without the application of a community-maintained patch that the Chromium developers have repeatedly declined to merge into the upstream codebase. ²⁵

2.3. Limitations and implications

It is worth noting that the methodology used for our empirical testing is largely synthetic in nature and does not directly reflect real-world WebRTC application configurations. In particular:

Since the web browser acts as both the local and remote peer whilst communicating over the loopback interface, we eliminate the influence of network bottlenecks and instead focus entirely on the capabilities of the browser. This stands in stark contrast to the methodologies used to test real-world WebRTC applications (which typically focus on being robust to varying network conditions), and directly reflects the specific focus of this investigation on technological limitations that exist within individual WebRTC peers.
Transmitting large numbers of concurrent media streams sourced from local video files is convenient for testing purposes but results in resource usage patterns that diverge from typical WebRTC use cases, particularly with regards to disk I/O. Additional resource usage is also incurred by the fact that the browser is acting as both the local and remote peer in each test, whilst also playing both the local and remote versions of each video. Unsurprisingly, we observed large spikes in CPU, GPU and memory use when video playback initiated during tests with large numbers of media streams.

Despite these limitations, our test harness provides interesting insights into the practical upper bounds of WebRTC in modern web browsers when presented with worst-case local workloads that thrash system resources. The results may be directly instructive for some non-traditional WebRTC use cases, and are still applicable in part to more traditional WebRTC applications as well.

In real-world use, the upper bounds for both concurrent media streams and concurrent data channels should be sufficient for most typical WebRTC applications. The average number of maximum concurrent media streams across browsers should be sufficient for most video-conferencing scenarios, particularly given that these numbers represent a worst-case local workload and real-world usage patterns should have far less impact on system resources. Even at Mozilla Firefox’s limit of 128 data channels, the ability to transmit 16KiB of data at a time across these channels provides plenty of flexibility, and widespread support for both the EOR and ndata extensions to SCTP will further increase this in the future.