AVEQ
  • Home
  • Solutions
    • Streaming Video Quality Measurement
    • Video Conferencing QoS and QoE
    • Automated Video & Web Quality Monitoring
    • Mean Opinion Score for Video
    • Crowdsourcing Data
    • ITU-T P.1203 Standardized QoE Model for Adaptive Streaming
  • Products
    • Surfmeter Automator
    • Surfmeter Mobile SDK
    • Surfmeter Player SDK
    • Surfmeter Dashboard
  • Resources
    • News
    • Blog
    • Factsheets
    • Full Surfmeter Documentation
  • About
  • Contact
  • English
  • Deutsch
  • Click to open the search input field Click to open the search input field Search
  • Menu Menu

What AT&T’s AMVOTS Tells Us About the State of Video QoE Measurement

Blog

AT&T recently published a paper on AMVOTS, their Automated Mobile Video Objective Testing System, built in collaboration with Ericsson. It is a lab-based platform for measuring video quality on mobile devices under realistic network conditions. For anyone working in video QoE — ourselves included — it is worth a closer look. We talked with Ericsson researcher David Lindero, who explained to us the background of their joint work. Read on to learn more!

What AMVOTS Does

Internet service providers like AT&T have a significant interest in measuring and improving the video QoE on their network. To that end, they’re researching ways to both capture and analyze video quality on mobile devices under realistic network conditions, to optimize the overall QoE on a mobile network. To realize this, AT&T’s AMVOTS system captures the HDMI output of a mobile phone and compares it, frame by frame, against a reference video using VMAF (Netflix’s famous open-source video quality assessment algorithm). The researchers are using the information to drive decisions about network resource allocation.

When you compare a reference clip with a mobile phone screen recording, you run into some obvious issues. The AMVOTS system handles these important aspects: frame alignment and visual corrections (cropping out UI overlays, masking logos, color correction). If you did not crop out UI elements, you’d get a lower picture quality score than what the user actually perceives, which would be misleading. These scores can then better reflect the actual picture quality rather than measurement artifacts. AMVOTS also implements the ITU-T standard P.1203.3 to assess the impact of buffering and stalling, using VMAF as “video quality input” to the model.

The system runs on a Dell server with a dedicated capture card, processing 1080p at 60fps in real time. Every 10 seconds, it produces a combined QoE score that covers both spatial quality (VMAF) and temporal factors like stalls and frame freezes.

As David says, “AMVOTS was created to be able to assess the end-to-end service quality for ‘any’ video based service that runs on a cellular network, or any network for that matter. Initially it was to test the behavior of new services where content providers were blaming the network, to be able to show that usually the problem was due to badly configured streamers. Now it has developed into a more capable tool that, in Eric Petajan’s (AT&T) vision, could become an open source alternative for running drive testing or evaluations of emulated services.”

The interesting part is the “QoE-in-the-Loop” concept: these scores could then be fed back into the Radio Access Network (RAN) in near real time, allowing the base station to allocate radio resources based on what users are actually experiencing rather than just raw throughput. AT&T’s results suggest that roughly 3x more video flows can be supported at “acceptable” quality levels when the RAN is QoE-aware — mostly by not wasting bandwidth on streams that already look fine and redirecting it to those that need it. This concept has been discussed in a VQEG work item on 5G Key Performance Indicators (5G KPI).

The key in this feedback loop is that “acceptable” is defined in terms of actual user experience, not just anyone’s guess about what particular network metrics mean. The actual quality score may very well differ depending on how the videos are encoded, and what type of videos are being streamed. Hence, the inclusion of an actual quality metric improves the overall efficiency of the system significantly.

The Bigger Picture: From Lab to Production

The paper describes a lab tool. But as Eric Petajan explained in an interview, AT&T’s longer-term goal goes further: using the ground-truth data from AMVOTS, combined with subjective testing (MOS scores collected at their Austin lab), to train prediction models that can estimate QoE from network traffic alone — without needing HDMI capture.

This is where it gets interesting for the industry. If you can predict QoE from network-side data, you can do it at scale across your entire subscriber base. But building that pipeline is a serious undertaking. You need:

  • The lab hardware
  • Good video QoE models and/or the subjective testing infrastructure to create subjective ground truth data
  • The network and ML expertise to perform the training
  • Enough training data to make the models generalize
  • A pathway to real deployment

There’s also a real challenge with encrypted traffic. You can really only estimate video quality very coarsely from TCP-level information. And, as QUIC adoption grows, the network-visible data that these models rely on becomes even thinner. Petajan was very open about the scalability question, calling it “kind of an open question” whether AMVOTS-derived insights can work beyond AT&T’s own environment.

Where Standardized Bitstream Models Fit in

As our readers may be aware, at AVEQ we focus on standardized QoE models like ITU-T Rec. P.1203 and P.1204. Check an overview of video models here. P.1203 was developed specifically for adaptive streaming (DASH, HLS), and it already models the subjective impact of quality switches, stalling, and resolution changes in its “integration module”. VMAF does not do that; you can only work with statistical aggregations of per-frame metrics.

Most importantly, the P.1204.1 (metadata-based) or P.1204.3 (bitstream-based) models do not need a reference video or HDMI capture. So, unlike VMAF, they work from metadata that the player already has: buffer state, bitrate, resolution, codec parameters, or from an analysis (decoding) of the transmitted video segments.

A full-reference approach like VMAF — where you need access to both source and output — will always be more precise for a given frame. But in many cases you don’t have the luxury of even uploading a video to the source (think of streams from services like Disney+, or live TV). Therefore, P.1204.1 is deployable wherever you have access to the player. And P.1204.3 can be embedded into active probing scenarios where a man-in-the-middle decryption (on your own device!) is still possible.

Since active probes capture data directly from the player, we can run them in real-life deployments, at the mobile edge, in network-centric locations, etc., even without needing dedicated hardware (since they can run as Docker containers). This is a huge advantage for operators who want to monitor real user experience across their entire subscriber base. Where the full-reference VMAF-based approach no longer works, you can deploy our SDK to hundreds of sensors, and get P.1203 scores for representative video sessions. These probes would then inform about the current state of the network, and could be used to trigger optimizations in the RAN or elsewhere.

Different Tools for Different Problems

As you can imagine from the previous explanations, such deployments are never simple. Whether and how you can measure the QoE depends on:

  • Who you are (an ISP, a CDN provider, a streaming provider)
  • Where you need to perform measurements (mobile devices, desktops)
  • How you want to use the measurements (for lab tests, or in real-life production deployments)
  • What you can control (the RAN, routing, CDN configuration, …)

AMVOTS is interesting because it covers the case where you want a detailed lab-based evaluation of possibly closed-source applications. The biggest caveat is that it requires you being able to feed the input to the system. At AVEQ we focus on another angle: measuring the quality of third-party apps from our mobile and desktop Surfmeter apps (like YouTube and Netflix), where you cannot control the video input.

It would be easy to frame this as “our approach vs. theirs” — but that’s not our point. AMVOTS is a good example of how to build a lab-based ground truth setup that might extend to other use cases (e.g., conferencing). When it comes to real deployments, AVEQ’s Surfmeter solves different problems at a different stage.

What Matters for Operators

The fact that AT&T — one of the largest operators in the world — invested this effort into video QoE measurement confirms research approaches that we have always been advocating for. The underlying premise is: managing networks by throughput alone is not enough. The differences in video services and what “acceptable” means for a user can vary across contents and services.

Here’s David’s view: “We need tools like this to show what telco operators and content providers are missing by not sharing data. Showing that ‘bitrate is not enough’, etc., is the first step in this story, and with QoE reports between clients and network, we could reach a much higher utilization of cellular networks with more, and happier, users. And maybe that even enables smart improvements that we haven’t even thought about yet…”

Ultimately, you need to understand what the user actually sees. For example, we know that a satellite operator can tune the bandwidth requirements for YouTube and Netflix differently, because they use different codecs and playout algorithms. Models like P.1203/P.1204 can tell you exactly how those differences translate into QoE, and how to optimize for them.

Now, operators like AT&T can make significant R&D investments to run such labs, create in-house prediction models, and validate the results subjectively. For operators who do not have that scale, a standards-based approach with active probes potentially offers a shorter path: deploy the tool, measure third-party streams, get P.1203 scores, and start making decisions based on actual QoE data.

To summarize, we’re happy to see the industry is moving toward QoE-aware network management. The key is to choose the right tool for the right problem, and to focus on what ultimately matters: delivering the best possible experience to users.

17. March 2026
Share this entry
  • Share on Facebook
  • Share on X
  • Share on LinkedIn
  • Share by Mail

Latest Articles

  • What AT&T’s AMVOTS Tells Us About the State of Video QoE Measurement17. March 2026 - 23:55
  • The OSI Model in Practice: How Surfmeter Measures the End-User Layer15. February 2026 - 23:22
  • AVEQ Surfmeter Brings Automated, Standardized Video QoE Testing to Enhancell’s Echo Tools 5.030. November 2025 - 17:21
  • New Research: ITU-T P.1203 Validated on Real-World Satellite Streaming Data29. November 2025 - 22:48
  • Gcore Deploys AVEQ’s Surfmeter for End-to-End Streaming QoE Diagnostics and CDN Optimization3. November 2025 - 2:59

Contact

AVEQ GmbH
hello@aveq.info

About Us

  • About Us
  • Contact Us
  • Careers

Solutions

  • Streaming Video Quality Measurement
  • Web and Video Streaming Analytics
  • Crowdsourcing Data
  • Video MOS Measurement

Legal

  • Legal Notice
  • Terms and Conditions
  • Privacy Policy
Link to: The OSI Model in Practice: How Surfmeter Measures the End-User Layer Link to: The OSI Model in Practice: How Surfmeter Measures the End-User Layer The OSI Model in Practice: How Surfmeter Measures the End-User Layer
Scroll to top Scroll to top Scroll to top

This site uses cookies. Please accept or decline cookies to proceed, or change your settings manually.

Accept allDo not acceptSettings

Cookie and Privacy Settings



How we use Cookies

Our website, like many others, stores and retrieves information on your browser using cookies. This information is used to make the site work as you expect it to. It is not personally identifiable to you, but it can be used to give you a more personalised web experience.

Other cookies may be necessary to enable custom functionality. In such cases, information about you may be shared with third parties. You can modify our use of cookies in these settings.

Necessary Cookies

These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

Google Tag Manager, Google Analytics, LinkedIn Insight

This website uses Google Tag Manager. Google Tag Manager enables us as marketers to manage website tags via an interface. Google Tag Manager takes care of triggering other tags which in turn may collect data.

You can disable Google Tag Manager here:

Google reCAPTCHA

This website uses Google reCAPTCHA to prevent spam in contact forms.

You can disable Google reCAPTCHA here:

Privacy Policy

You can find more details on our use of external services and cookies in our privacy policy

Privacy Policy
Do not acceptAccept default selectionAccept all cookies
Open Message Bar Open Message Bar Open Message Bar