New Research: ITU-T P.1203 Validated on Real-World Satellite Streaming Data

We’re pleased to share new research that has been published in IEEE Access. The findings validate the accuracy of QoE prediction models such as ITU-T Rec. P.1203 against real-world streaming conditions. The peer-reviewed paper, titled “Satellite Streaming Video QoE Prediction: A Real-World Subjective Database and Network-Level Prediction Models,” represents a collaboration between researchers at the University of Texas at Austin (LIVE), Viasat Inc., TU Ilmenau (Audiovisual Technology Group), and AVEQ. As the title indicates, it tackles the challenge of predicting video Quality of Experience (QoE) over satellite networks using real user data.
Why This Research Matters
As we all know, video streaming accounts for the vast majority of all internet traffic, and a good streaming experience is important for the perception of ISPs among end users.
Satellite networks are increasingly important for delivering video to underserved areas, aircraft, ships, and other remote locations. But satellite connections may come with unique challenges, in particular when geostationary satellites are used: while they offer high theoretical bandwidth, the latency is intrinsically high, and channel conditions may vary depending on factors like weather. These aspects lead to degradations that may surface to users in terms of rebuffering events, resolution drops, and bitrate variations. Obviously, those significantly worsen the viewing experience.
But it’s unclear how well existing QoE models perform under these specific network conditions, and whether user-perceived quality can be predicted from network-level metrics. Imagine: if you are a network engineer, do your typical assumptions about correlation between bandwidth and quality still hold when there’s a one-way latency of >250 ms?
For Internet Service Providers (ISPs), the ability to predict and monitor Quality of Experience (QoE) is critical for customer satisfaction. While most existing QoE research often relies on artificial laboratory setups with simulated distortions and limited content, this new database presents real-world traffic captured using real service conditions.
The LIVE-Viasat Real-World Satellite QoE Database
The joint research team, led by researcher Bowen Chen, captured a diverse set of real-life videos from YouTube from their laboratory, using a Viasat-supplied satellite dish. Using the video data, they then created a subjective QoE database consisting of the actual videos and users’ ratings on the perceived quality.
Specifically, the LIVE-Viasat Real-World Satellite QoE Database consists of 179 videos captured from actual YouTube streaming sessions over Viasat’s operational satellite network under different configurations. These videos exhibit representative distortion patterns – real stalls, real resolution switches, and real bandwidth-induced quality variations. The laboratory study then involved 54 participants who provided both:
- continuous-time ratings (moment-by-moment quality assessments while watching) and
- endpoint scores (their overall impression after each video, i.e., a retrospective rating)
This dual approach provides novel insight into how viewers perceive quality fluctuations as they happen, and how those fluctuations affect their final judgment.
AVEQ’s Contribution: Surfmeter and P.1203 in Action
AVEQ’s Surfmeter Automator platform played a central role in the data collection methodology. The researchers used an instance of Surfmeter to conduct automated measurements of the video streaming sessions, including:
- Actual service measurements of YouTube’s web platform that mimic what end users would be doing
- High-quality video recordings of fullscreen playback for use during subjective tests
- Precise measurement of application-level KPIs during the streaming that measure what actually matters (i.e., precise stalling times)
- Capturing of network packets traces using tcpdump as ground truth for modeling
Most importantly, the researchers performed the QoE analysis using the P.1203 model implemented within Surfmeter. This model predicts the final, retrospective rating of an entire streaming session based on the application-level KPIs collected during playback.
This demonstrates exactly the kind of automated, large-scale QoE measurement that Surfmeter was designed for. The platform’s ability to simultaneously record video, capture network data, and compute standardized QoE metrics made it possible to build a database of this scope easily.
P.1203 Performance: Strong Results on Real-World Data
One of the most significant outcomes of this research is the validation of ITU-T Rec. P.1203 model on the satellite streaming conditions. The model was originally developed for regular, terrestrial networks such as cable or fiber networks. Still, on these unseen conditions, and with new subjective data, the P.1203 model achieved a Pearson correlation (PLCC) of 0.94 against the subjective ground truth scores – demonstrating that standardized QoE models can accurately predict real user experiences, even in different contexts.
The strong correlation confirms what we’ve seen in our own deployments: metadata-based QoE models like P.1203 provide reliable quality predictions without requiring access to video pixels – making them practical for real-world network monitoring where content is often encrypted. (If you want to find out more about the different types of video quality models and measurements, check out our overview article.)
Creating Ground Truth: The Value of Standardized Models
Our research also demonstrates an important methodology for creating QoE ground truth datasets. By using Surfmeter to capture both the objective measurements (network packet captures, application-level KPIs, video metadata, and QoE predictions) and screen recordings suitable for subjective testing, researchers were able to build a comprehensive database easily.
This approach has broader implications:
- For ISPs and operators, it shows that P.1203 predictions correlate well with what users actually experience on satellite networks. You can trust these scores when monitoring your network performance, even on challenging links.
- For researchers, the publicly available database provides ground truth data for developing and validating new QoE models. The combination of subjective scores, objective metrics, and network-level data enables research that wasn’t previously possible.
- For QoE practitioners, it validates that standardized models remain accurate even as streaming technology evolves – provided the underlying video quality components are up-to-date for modern codecs.
In the context of the joint paper, the researchers also explored predicting the user QoE directly from network-level metrics using machine learning techniques. We encourage you to read the full paper for details on these additional modeling approaches.
Key Findings: What Affects Satellite Streaming QoE?
Beyond model validation, the research provides valuable insights into how different impairments affect user perception:
- Stalling events have a cumulative impact. While viewers are relatively forgiving of one or two brief interruptions, QoE scores degrade significantly (below 20 on a scale from 0-100) when stall counts exceed two. Sports content shows steeper QoE declines with stalls than other categories – since it makes interruptions more disruptive. As the researchers noted, “this is likely because sports content contains fast-moving action which may be disrupted or lost by stalls, causing a larger perceived reduction of QoE.”
- Position matters as much as duration. Stalls occurring near the end of a video have a significantly stronger negative impact on overall QoE ratings than those at the beginning. In particular, the authors write that “A regression analysis further quantifies this effect, showing that MOS declined by 0.58 for every 1% shift toward the end of the video.” This “recency effect” that we know from psychology means that viewers weigh recent experiences more heavily in their final judgment – something P.1203’s temporal integration module is designed to capture already.
- Resolution changes follow predictable patterns. Sudden drops in resolution cause corresponding drops in continuous QoE scores, with sharper declines having more pronounced effects. Videos that undergo gradual, stepwise resolution reductions receive better endpoint scores than those with abrupt drops – even when the final resolution is the same.
These findings align with the assumptions built into P.1203’s design and confirm that the model’s approach to temporal pooling reflects actual human perception.
Implications for Network Monitoring
For operators deploying QoE monitoring solutions, this research reinforces several practical points:
- Metadata-based models work. You don’t need pixel-level analysis to get accurate QoE predictions. Especially in a monitoring/benchmarking context, you do not need to measure how well YouTube encodes their video – you need to understand how well your network serves that content. This is what our models excel at.
- Codec support matters. As streaming services adopt newer codecs like AV1, your QoE tools need updated models. The extensions to the standardized model that AVEQ used in this study show how the P.1203 framework can accommodate new codecs while maintaining prediction accuracy.
- Real-world validation is essential. Models trained on synthetic distortions may not perform as expected on actual network impairments. This database provides a benchmark for satellite-specific conditions, and the approach is reusable for future configurations.
Access the Database
The LIVE-Viasat Real-World Satellite QoE Database is publicly available at the LIVE Lab website. The paper is published in IEEE Access (DOI: 10.1109/ACCESS.2025.3631409) under a Creative Commons license – you just need to fill out a form to get access to the database.
Learn More About Video Quality Monitoring
At AVEQ, we help ISPs, mobile operators, and streaming providers measure and optimize video Quality of Experience using standardized approaches like ITU-T P.1203. Our Surfmeter platform – the same tool used in this research – provides automated measurement capabilities for fixed networks, mobile connections, and satellite links. Contact us to learn how we can help you understand and improve your customers’ streaming experience.

