Enhancing Video Stream Quality Measurement: Active Test Clients vs. Network Layer Analysis
In today’s hyperconnected world, where streaming video has become an integral part of our daily lives, Internet Service Providers (ISPs) and content providers are engaged in a relentless battle to deliver top-notch video experiences to their customers. However, traditional methods of monitoring video stream quality from the 2000s have found themselves become useless due to HTTPS encryption.
As ISPs strive to ensure seamless streaming, they’ve come to realize that merely measuring encrypted network traffic falls short in providing a true assessment of the viewer’s experience. So, how can ISPs and content providers truly gauge the quality of the videos their customers watch? The answer lies in a measurement setup that puts itself into the position of the viewer. In this post, we explain the background, discuss the solution of encrypted stream monitoring, and finally unravel our technology. We explore why we believe this to be the best possible video quality assessment method and provide some real-life examples. Are you sitting tight?
Let’s dive in.
Understanding the Challenges
It’s an old (well, at least to Internet standards) question that involves network neutrality, “Quality of Service”-like bandwidth shaping, peering contracts, and, in the end, possibly frustrated end users. What data can ISPs use to ensure that their customers can stream high-quality video? What data will Over the Top (OTT) or CDN providers share with ISPs to make that possible, if at all?
(Author’s note: The answers seem to be: 1) very little, and 2) very little, depending on whom you ask.)
It used to be possible to passively monitor YouTube streams as they passed over the networks, down to a level of individual video segments. With such data, even if it’s a large overall data volume, peeking into the individual users’ streams was easy enough. Want to understand the video resolutions being played? You could get that easily just from the URL signatures. Stalling? Just mimic the video player buffer.
Alas, encryption is now pervasive (and that’s a good thing™!). But from the perspective of a communication service provider, this means that you are left with only basic information about your users’ streaming experience:
- The rough duration of their video sessions
- The throughput and total amount of data transferred
- The CDNs involved (through DNS requests, but that may change)
You could use this data to get a general view on streaming usage, and possibly detect outages for your users (e.g., if traffic volume suddenly drops). But without more detailed information, ISPs face challenges in accurately assessing the quality of video streams and identifying any issues or bottlenecks that may impact the viewer’s experience. Ideally, you’d like to know what level of quality customers receive (e.g., in terms of video resolution), and whether they might face severe issues like stalling.
For ISPs, it’s also important to guarantee levels of service to their customers. This comes in the form of bandwidth capabilities and service availability. But users don’t think in terms of bandwidth (at least the non-techy ones). They want to stream 4K video. They want to play cloud gaming and find it usable. How do you know these application KPIs when you only look at raw network metrics? Some countries go as far as mandating minimum application criteria for services that customers must be able to use in a regular manner. For that, you need to know these KPIs.
One solution that has become viable due to advances in Machine Learning (ML) — particularly Deep Learning (DL) — is analyzing the encrypted traffic on a lower layer of the ISO/OSI stack, and inferring/predicting the application-level KPIs from that. In the next section we’ll focus on this approach. But we’ll also cover an alternative — active test clients.
Encrypted Network Analysis
In recent years, there have been significant advancements in analyzing encrypted network traffic to gain insights into the quality of video streams by leveraging big data and machine learning techniques.
This approach involves training models (algorithms) to identify patterns and extract relevant information from the encrypted packets at the TCP layer. By analyzing features such as packet size, inter-arrival time, and payload characteristics, these models attempt to infer performance KPIs and the quality of video streams without the need for decryption.Some examples of work in this area are from academic researchers and R&D departments of larger ISPs. Irena Orsolic et al. have presented a framework for evaluating the quality of video streams based on encrypted traffic. Sarah Wassermann et al. have also presented a method to estimate YouTube quality from encrypted streams. Another approach by Tarun Mangla et al. also uses ML to predict quality-relevant features. Steve Göring et al. (with whom we have previously authored papers) also classified streaming quality based on the ITU-T Rec. P.1203 QoE model. These approaches have all been published online for the public to reproduce.
Where Encrypted Monitoring Falls Short
There are some drawbacks though. All these solutions have in common that they predict video quality mostly in terms of classes of resolution (high/low), stalling (no stalling, stalling), or QoE (e.g., low/high QoE). A more fine-grained output cannot be achieved, as the accuracy of the prediction is not high enough. Furthermore, the aforementioned models are not ready to be used in practice — we only have academic descriptions of the algorithms.
Also, a large dataset is always needed to train and test such models. It requires a dedicated software to collect measurement results over a period of time with the right level of granularity, capacities for training algorithms, and the hardware/infrastructure to handle the massive load of data when operationalizing the outcome.
An issue with such models is that they’re frozen once trained. But what happens if an OTT provider changes their video codec to one that’s more efficient, like YouTube did when they started rolling out AV1 a few years ago? All the previously trained models would become less accurate, if not useless due to the different relationships between bitrate, resolution, and achievable quality. The same goes for playout algorithm changes. And did we mention QUIC? Different transport means having to learn different features from a network perspective. Finally, you would have to perform this exercise for many different streaming providers, since they all work a bit differently.
There are commercial solutions out there that claim to predict QoE for various OTT applications based on encrypted traffic alone. These go as far as saying that you can “measure YouTube QoE” with this approach, but to our knowledge, none of these have been externally validated in terms of prediction accuracy. So you might get a general solution that predicts some form of quality score, but when it comes to troubleshooting the reasons for bad QoE, you might not know what causes it. Or you may be led down the wrong path due to incorrect assumptions being made about how the different services work on an application level.So, while these methods would allow ISPs to capture quality-relevant data in a large scale manner — and that may certainly be useful for detecting outages or traffic pattern changes — it cannot give the detailed results that one gets from actually streaming a video. Or maybe you would like to obtain the ground truth data for creating a ML-based model for encrypted stream monitoring? This is where active measurement clients come into play.
Active Monitoring Clients
It all started with an idea in the 2010s. What if we could measure video quality from a client — just like the customer perceives it? What if we could do it in a lightweight manner that does not involve a complex recording and synchronization setup? This is how we developed the first prototypes of what has become Surfmeter. Measurements are taken directly from the context that a user would be seeing, in an active monitoring environment. We trigger the measurements based on a predefined schedule, and can decide beforehand what we want to measure.
The benefit of this approach is clear:
- We measure the real thing. We don’t make any assumptions about what is happening, or how different services operate.
- Our solution works with any service that runs in a web browser. (We have something for mobile too, but that’s for another post!)
- We can understand exact timings of the service and its performance — from website loading events to video load timelines, quality switches, or advertisements.
- You can define your measurement scenario, from display size to chosen video content, to simulated bandwidth constraints.
- Additional debug data can be extracted from the client applications, such as YouTube’s famous Stats for Nerds metrics, or an entire PCAP dump of the streaming session, along with a screen recording.
In addition to that, our approach to calculating a quality score is quite different. You have to understand that at the time — and still today — many video quality measurement solutions required analyzing the video signal itself (that is, the pixels), and comparing the received video against a reference that is stored on a server. This is a so-called “full reference” (FR) approach. FR-based measurements are not only very computationally intensive, they’re also not very practical. In many cases, you do not even have access to the original video, because you’re not the service provider. Of course, you could measure adaptive streaming by comparing the played video against a downloaded version of the video on the server, but that is also often impossible. This rules out the usage of video quality models like VMAF. Finally, since content providers employ DRM protection, recording of the screen is out of the question in many cases. We therefore rely on metadata-based and bitstream-based models to infer the quality of streaming sessions right from the browser or the device on which the streams are received.
If you would like to see a demo of this (and you aren’t afraid of a bit of command-line terminals), you should watch this YouTube video. There, we explain the types of measurements that can be run, and what the results look like.
What Active Monitoring Can and Can’t Do
Now, of course, it’s a bit of an apples to oranges comparison: with an active monitoring approach, you get much more fine-grained data, but you cannot quickly obtain large datasets (compared to passively ingesting all encrypted streams).
Operationalizing stream monitoring across a large network may not be feasible with just active clients, in particular when you need to detect outages from particular customer segments that may not be covered by an active monitoring instance. These active clients, however, give you a good proxy for what a customer might experience, especially when set up in a known context (e.g., at a router in your lab, somewhere in the backbone, at your customers’ homes on a small device, etc.). This way, you can rule out other issues at the user end. (And let’s face it, customer service complaints happen even if on your end, everything’s fine.)
Our solution not only works on hardware: you can deploy these Surfmeter-based measurement probes in any virtual machine or host that can run Docker containers. This allows you to scale the measurements easily across a larger set of devices and locations.
A Holistic Approach To Measure OTT Performance
We believe that in order to fully understand your network and how various OTT services perform on it, you need to set up an active testing solution. Using that, you can:
- Schedule a series of tests for heavily used OTT services, mimicking what your customers might do when they use the Internet
- Collect network-level features and performance data in trace files, which enable you to debug what’s happening under the hood
- Obtain deep application-level insights that are otherwise unavailable
This type of data is valuable for being undisturbed by other factors, since you know where you collected it. Using an active data collection pipeline, you can also combine the obtained network-level features with the application-level information (e.g., video load times, video resolutions, QoE scores). This enables you to conduct similar analyses as the ones shown in the literature. You can then correlate network-layer analyses on encrypted streams with your new insight on application KPIs and QoE, and train models to predict the latter.
Our customers are particularly interested in the precise details of the services that are streamed over their networks. Whether it’s about detecting possible bottlenecks in certain locations, making sure that the OTTs work as intended and mandated by regulatory authorities, or testing the capabilities of new access products — the active measurement approach guarantees unique insights into those factors.
Contact us if you are interested in seeing a demo of this approach, or if you have a specific use case in mind that you would like to talk to us about.