What does a MOS of x mean…? Interpreting the MOS in real-world terms

*Illustration: Know what your users see.*

Networks can behave like humans: Sometimes they have a shiny great day and everything works like a charm – and sometimes they feel tired and are soooo slow. Usually when humans get tired, they need a rest from the hard work they did. Networks don’t have that luxury. Under stress, they greet us with loading spinners, rotating hourglasses or just a simple “the service you are calling is not available” message in the browser. How can you, as a network operator, make the right decisions and make networks more resilient against packet loss or jitter, inefficient routing and transmission delays. And how do you pre-emptively solve customer complaints? The answer is: continuous quality monitoring. For video, web browsing, or other OTT apps like conferencing services.

But all the best data is not useful if you can’t make sense of the numbers you’re seeing. So in this article, we want to go deeper and explain how you can understand the data from the monitoring itself. In particular, the Mean Opinion Score.

In an ideal world, you’d have a single score that translates all events that are happening in the networks into a single metric that can easily explain: is the customer at the other side of the edge happy or not, and if not, why not? Luckily, at AVEQ, we can do this with our MOS rating – the Mean Opinion Score for video streaming. This rating enables us to take a top-down perspective on end-user experienced quality. We start with the user opinion and progress to technical data – this avoids having rough and unclear estimations based on bottom-up approaches that eventually will lead into long nights of NOC-room meetings.

What does “MOS” mean, even?

But what does this mysterious M. O. S. actually mean? The “Mean Opinion Score” historically comes from the testing of telephony systems at ITU-T – the International Telecommunication Union. People were asked to perform in laboratory tests and provide a rating of their call quality, and their ratings were averaged into a MOS. These subjective ratings could then serve as the ground truth for MOS prediction models. These models can then take technical parameters and estimate the user opinion. But: MOS is not equivalent to MOS. First of all, there are several standards related to MOS – and you should only use the right standard for the right application –, and there are many companies on the market providing some kind of “MOS” too, even though these are rarely based on publicly available or validated research.

The MOS we use to drive our video quality measurements in Surfmeter is based on the ITU-T Rec. P.1203 standard, extended with the AVQBits model from TU Ilmenau, scientifically proven and peer-reviewed methodology by the co-authors of the original P.1203 standard. The P.1203 model can accurately predict how a user would rate, e.g., a Netflix session on a scale from 1 to 5, where 1 is Bad and 5 is Excellent. While the model itself has been developed with extensive sessions of subjective testing, including thousands of individual ratings, the algorithm itself can do a 100% fully-automated computation for any OTT-service, without the need for any individual user-by-user happiness surveys.

In fact, using an automated MOS calculation, there is no need anymore to talk with the person in front of the computer, phone, TV, iPad, etc. just to understand if they are happy with the service quality or not. (Of course, we encourage you to still talk with your customer, as humans love to feel connected – but talk with them about things that matter, not about technical issues!)

How should I interpret the MOS?

Okay, so we already understand that there is a score that can tell us how happy the average customer is with the delivered network quality for any OTT service. The actual definition of the MOS scores relates to the underlying rating scale that was used in the subjective tests. The de-facto standard (from ITU-T Rec. P.910) is the following Absolute Category Rating scale:

1: Bad
2: Poor
3: Fair
4: Good
5: Excellent

On a first glance that would mean: you should always strive to achieve excellent quality for your customers. In practice, a perfect score of 5 will never be reached, since humans are humans and so it is hard to agree when excellent is excellent. If you surf the web for recent restaurant ratings, the more people rate a restaurant, the less you will find a restaurant to be rated with an exact score of 5.0, even if you get the best food, atmosphere and exceptional service that you can find in the whole region. So, realistically, the highest possible MOS will be around 4.7. Any value above 4 can therefore already be considered very good. (ITU-T calls this GoB – “Good or Better”.)

Looking into our Surfmeter demo data, we see how the scores might behave in practice: When everything runs fine, the MOS dashboard might show more or less straight lines above 4. But when widespread noticeable issues occur, it might look different – suddenly one line declines, and falls into lower ranges, below 3 even. Reasons for this are various, which we can investigate further with the measured data in Surfmeter. For now it is enough to understand that a value between 3 and 4 indicates issues with the streaming performance that warrant your attention, while everything below 3 indicates severe performance problems.

How do I dig deeper into the MOS values and the KPIs?

When troubleshooting low MOS values, a closer investigation is needed: luckily, the P.1203 model helps us in diagnosing the reason for MOS drops. In addition to the MOS itself, it offers further Key Quality Indicators (KQIs) for detailed aspects: If the Perceptual Stalling Indication is significantly below 4, long initial loading delays or stalling will be the cause of low overall MOS. The service may be optimized to offer lower-bandwidth video representations to prevent stalling, assuming a constant downlink condition. On the other hand, if the Average Audiovisual Quality is low, this means that only low-resolution/low-bitrate representations have been played out. This could be caused by generally low bandwidths, or very conservative player strategies preventing stalling by loading only low-bitrate video.

On an even more fine-grained level, AVEQ’s Surfmeter allows you to reconstruct every measured streaming session in great detail. We not only provide a great number of video-streaming related KPIs, we also let you break down the session on a per-second basis, to see what the player did.

So, what do I do when the MOS is low?

Once you understand how to interpret the MOS, you can think about how you might use the data. We always advise to look more closely when the MOS gets below 4, even if it is just for a short time. Such seemingly minor differences can, when unnoticed, lead to severe issues in the long term – problems that you don’t want to let your customers experience. But the question is, is this quality decline a local issue in the network, is it due to the upstream provider, or is it a player configuration error?

Having multiple Surfmeter probes in your network answers your question easily. Seeing the same issue on all probes in the network simultaneously? This is something widespread that all your customers will likely see. It can be due to some service change of the OTT provider itself, or networking changes along the transmission path. If only some of the probes (or even one) shows issues – this is something local. Usually due to issues affecting parts of the (access) network. Multiple independent services show quality issues ranging over multiple probes? That is definitely a network issue! Only one service affected? It’s clear whom to blame.

As you can see, assuring a certain level of network performance can be challenging. The Mean Opinion Score helps to quickly pinpoint when quality tends to fall short. With the underlying KPI data captured, Surfmeter helps you resolve these scenarios.

Wondering how this could run in your network environment? We are happy to guide you through our demo. At AVEQ our aim is always to build the newest technology that helps network operators to achieve best customer experience. Contact us and learn more!

What does a MOS of x mean…? Interpreting the MOS in real-world terms

What does “MOS” mean, even?

How should I interpret the MOS?

How do I dig deeper into the MOS values and the KPIs?

So, what do I do when the MOS is low?

Contact

About Us

Solutions

Legal