What Football Analysts Actually Need from Computer Vision
Football analysts don't need the highest mAP score. They need reliable, fast, and actionable intelligence from the video already in front of them.
A professional football analyst typically spends two to four hours reviewing a single match. Not two to four hours watching football: two to four hours locating, tagging, clipping, and organising footage before any tactical interpretation can begin. The analysis happens at the end of that process, in whatever time remains before the next training session.
That is the real design constraint for sports computer vision. Not detection accuracy measured in mAP. Not tracking precision measured in HOTA. The constraint is analyst time, and the question is whether a CV pipeline reduces the hours spent on mechanical work or simply replaces one form of friction with another.
Three workflows, three different requirements
Football analysts are not a homogeneous user group. The work divides into three distinct workflows, each with different time horizons, outputs, and tolerance for error.
Match review operates on a cycle of hours. After a match, the coaching staff expects a tactical breakdown before the next training session, sometimes the following morning. Analysts need to identify pressing sequences, defensive shape in transition, set-piece execution, and individual errors from 90 minutes of footage. The value of that review decays rapidly. Feedback that arrives two days late no longer informs the session it was meant to change.
Opposition scouting operates on a cycle of days. Before a scheduled fixture, analysts build a profile of the opponent's tendencies: preferred defensive shape, attacking patterns from wide areas, set-piece routines, key individual threats. This involves watching multiple matches, cross-referencing tactical observations, and producing a concise briefing that a coaching staff can act on. The work is slower, but the output needs to be reliable enough for match preparation to depend on it.
Player development operates on a cycle of weeks. Tracking how an individual player is improving in a specific movement pattern, whether a central defender's positioning in transition or a fullback's timing of third-man runs, requires consistent positional data across multiple matches. This is where tracking quality matters most, because comparative analysis over time amplifies any systematic errors.
Each workflow has a different primary need from computer vision. Match review needs speed and coverage. Opposition scouting needs accuracy on the tactical categories that matter, even if general detection metrics are lower. Player development needs consistency: the same methodology applied across every match, so that changes in measured behaviour reflect the player rather than pipeline variance.
The mAP gap
When researchers benchmark sports CV systems, the primary metric is usually mAP: mean average precision for detection, or HOTA for full tracking pipelines. These are reasonable measures of model quality in controlled conditions. They are a poor proxy for what analysts need in production.
A system that achieves 85 mAP on a benchmark dataset but fails reliably during high-density pressing sequences, goal-area scrambles, and set pieces is less useful than a system that achieves 75 mAP but behaves consistently across exactly those situations. The moments where analytical insight matters most, where the tactical question is what happened during a high-press sequence or who was where during a corner kick, are the same moments where detection difficulty peaks.
SoccerNet-Tracking states this directly: tracking under fast motion and severe occlusion is "far from being solved." That is not false modesty from the research community. It is an accurate description of the gap between benchmark conditions and the moments analysts most need covered.
The implication is not that current CV systems are too immature to be useful. It is that the design question for analyst-facing products is different from the research question. Research asks: what is the highest accuracy achievable on this benchmark? Product design asks: which failure modes are acceptable, which are not, and how do we communicate uncertainty to the person relying on this output?
Reliability beats peak accuracy
A consistent 70 mAP system that flags its own uncertainty and degrades gracefully is more useful to an analyst than a system that claims 85 mAP but produces silent errors. Silent errors are analytically dangerous: they produce confident wrong answers that analysts build tactical decisions on, with no indication that the data should not be trusted.
This has specific implications for the calibration layer. As we covered in the broadcast camera problem, a detection system that performs well but feeds incorrect pitch coordinates produces useless tactical output downstream, regardless of how well the individual models score. Calibration failures cascade silently through the pipeline while detection and tracking metrics look fine.
The practical requirement is not just accuracy. It is calibrated confidence: systems that know what they do not know, surface it clearly, and let analysts override or discount uncertain outputs rather than presenting everything with equal confidence.
What actionable actually means
"Actionable insights" is the phrase every sports analytics company uses. It rarely gets unpacked. For a football analyst operating under time pressure, actionable means two things.
First, it means the output connects directly to a decision the coaching staff can make. Positional data for its own sake is not actionable. Positional data that answers a specific tactical question, whether the team's defensive line held its shape during the opponent's counter, or how far the striker presses from the front, is actionable because it resolves an uncertainty that the coach was already carrying.
Second, it means the output arrives in time to influence the decision it is relevant to. A full tactical report that arrives two hours after the coaching session has no operational value, even if it is analytically correct. Latency is not an engineering afterthought for sports CV: it is part of whether the system produces value at all.
Both requirements point toward natural language interfaces as a direction the field is moving. Rather than requiring analysts to navigate data layers and construct queries manually, systems that accept questions in plain language and return structured answers reduce the translation cost between analytical intent and output. MatchGraph approaches this through natural language queries over the game state it constructs: an analyst can ask what happened positionally during the opponent's first goal rather than building a filter sequence to find the relevant clip.
Designing for the practitioner, not the benchmark
The history of sports analytics products that failed to achieve adoption is largely a history of products that optimised for the metric that was easiest to measure, not the workflow that mattered. Data quality, as measured by internal benchmarks, was high. The output arrived in formats that required significant analytical training to interpret. The interface added cognitive load rather than reducing it.
Research benchmarks define what is technically possible. Analyst workflows define what is commercially necessary. The two are not the same, and in the cases where they diverge, the workflow wins: a technically impressive system that does not reduce analyst workload will not be used.
The CV stack described in the previous post in this series produces detection, tracking, calibration, identity, and event outputs. The question this post is asking is: what does an analyst actually do with those outputs, and how should that shape what the pipeline prioritises? The answer is that speed of delivery, consistency across game situations, transparent uncertainty handling, and connection to actual tactical questions matter more than peak accuracy on controlled benchmarks.
That is a design brief as much as an engineering one. And it is the brief that Beach's Football Video Intelligence playbook is built around: not maximising a benchmark score, but producing intelligence that coaching staff trust enough to prepare with.
If your organisation is evaluating what AI can extract from match footage, the Performance Data Assessment is the entry point. It maps your current video infrastructure, identifies the gaps between what you have and what your analytical workflows require, and produces a prioritised roadmap for closing them.