Why Football Is the Proving Ground for Sports Computer Vision
Football dominates sports CV research for structural reasons. Understanding why reveals what transfers to other sports and where the gaps remain.
Ask a computer vision researcher which sport has the richest benchmark ecosystem, the most published methods, and the most direct line from research lab to commercial deployment. The answer is almost always football. Not because the research community has a particular affinity for the game, but because football presents the right combination of problem complexity, data availability, and commercial incentive to make it the default proving ground for sports CV.
Understanding why football dominates the field matters for anyone building in this space. It explains which research you can trust, what transfers to other sports, and where the gaps are.
The problem is genuinely hard
Football's dominance in research is partly because it offers the hardest version of several core computer vision problems.
Twenty-two outfield players move simultaneously through a space 105 metres long, in continuous play that rarely stops for more than a few seconds. Unlike basketball, which has a shot clock and regular stoppages, or American football, which pauses between every down, football generates 90 minutes of near-uninterrupted multi-agent motion. There are no natural sampling frames. Every minute of footage is live state.
The ball is a particular challenge. It is small, fast, and routinely occluded by players' bodies. Models trained on general object detection struggle because the ball's visual signature in broadcast footage is a blurred white circle that can be partly hidden, partly out of frame, or visually confused with markings on the pitch. Ball tracking in football remains harder than player tracking by most published benchmarks.
Players themselves present the canonical identity problem. Twenty-two people in similar body shapes, wearing two sets of matching uniforms, distinguishable primarily by jersey number. When a player is facing away from camera, partially occluded, or moving quickly, jersey numbers become unreliable. Re-identification across a 90-minute match, including after occlusion or camera cuts, is an open problem. SoccerNet's re-identification benchmark uses 340,993 player thumbnails from 400 games precisely because the problem requires that scale to evaluate meaningfully.
If you can solve detection, tracking, and identity in football, you have solved the hard version of each problem.
The benchmark ecosystem is unmatched
The single biggest reason football dominates sports CV research is SoccerNet, maintained by EPFL. SoccerNet is not one benchmark. It is a coordinated suite of benchmarks covering the full sports CV pipeline, using 500+ hours of broadcast football with 300,000+ annotations.
The tasks covered span the entire stack: action spotting, player tracking, camera calibration, player re-identification, jersey number recognition, dense video captioning, and game state reconstruction. Each task has a defined metric, a public leaderboard, and an associated challenge. SoccerNet-GSR, introduced in 2024, is the most demanding: it evaluates whether a pipeline can produce a complete minimap, with tracked and identified players in pitch-coordinate space, from broadcast video. No other sport has anything close to this breadth of standardised evaluation.
This matters for research credibility. When a team publishes a new tracking method and validates it on SoccerNet-Tracking, the result is directly comparable to every other published method. When another team improves camera calibration using SoccerNet's calibration benchmark, the improvement can be measured in terms of downstream game state reconstruction quality. The coordinated structure makes incremental progress visible and compounding. Other sports have individual benchmarks, but nothing with this level of task coverage and community coordination.
The depth of the SoccerNet ecosystem has created a reinforcing cycle: good benchmarks attract research, research generates methods, methods improve benchmarks, improved benchmarks attract more research. The result is that football-domain CV has outpaced general sports CV by a wide margin, and researchers building systems for any sport typically start with football methods as baselines.
Commercial incentive drives production quality
Academic benchmarks explain research concentration, but commercial deployment explains why football CV methods are unusually mature compared to other sports domains.
Hawk-Eye operates across 25 sports in 100+ countries and has been running multi-camera tracking for officiating and broadcast enhancement for over two decades. The production constraints of live football, reliable ball tracking at broadcast resolution in real time, high-precision goal-line technology for officiating, set the bar for what production-quality sports CV requires.
Second Spectrum provides optical tracking data for the Premier League, NBA, and MLS. The Premier League contract alone covers every match in the top division of the world's most commercially valuable league. Stats Perform's TRACAB system is deployed across major European leagues. These commercial deployments have created a large base of practitioners who understand what high-quality tracking data looks like and what analysts do with it, giving researchers unusually direct feedback on what matters in production.
At the grassroots end, Veo has recorded more than four million games using automated single-camera capture. Hudl serves 315,000+ teams across 40+ sports. The commercial volume is large enough that improvements in football CV methods translate to measurable product quality at scale.
No other team sport has this combination of elite commercial deployment and grassroots distribution simultaneously active and growing.
Research transfers, with caveats
The concentration of research in football is not a dead end for other sports. Multi-object tracking methods developed and validated on football data, including ByteTrack and BoT-SORT, are now standard baselines across sports CV. The SportsMOT benchmark covers football, basketball, and volleyball using a unified framework, explicitly designed to test whether football-trained methods transfer.
The answer is: mostly, with important exceptions. TeamTrack demonstrates viewpoint sensitivity that varies dramatically by sport. A detector fine-tuned on football achieves mAP of 52.7 on side-view football footage but drops to single digits on top-down views of the same sport. Different sports use different default camera positions. Ice hockey and basketball use overhead-angle coverage; football broadcasts default to wide-angle side views. Methods that work well on football broadcast footage may need retraining for different camera geometries.
The identity problem also generalises with sport-specific constraints. Basketball has five players per side rather than eleven, but players are taller and move faster. Rugby has more physical clustering than football. Tennis has two players and near-total isolation from other agents. Each sport changes the difficulty of individual detection, the density of occlusion, and the reliability of jersey-number-based identity differently. Football methods provide the starting point; each sport's production system requires sport-specific fine-tuning.
The critical insight is that the research architecture transfers even when specific models need adaptation. The framework of stacking detection, tracking, identity, calibration, and event recognition is universal. The benchmarks to validate that stack at a system level, as opposed to isolated component benchmarks, are not available for most sports. Football's system-level evaluation infrastructure, particularly SoccerNet-GSR and the GS-HOTA metric, is a template that other sports will need to develop.
What this means for building
If you are building sports CV for any team sport, the practical implication is straightforward: start with football methods. The detection architectures, tracking pipelines, and identity frameworks developed and validated on SoccerNet represent the current state of the art, and they are well-documented with reproducible baselines.
Understand what transfers and what does not. Camera geometry is the highest-risk variable. If your target sport uses a different default camera position than broadcast football, expect to invest in retraining detection and calibration specifically. The architecture works; the model weights may not port cleanly.
Use the benchmark structure as a planning model even when the benchmarks themselves do not exist for your sport. Define your target output, build evaluation metrics that measure end-to-end pipeline quality rather than individual components, and treat calibration as a first-class concern rather than a preprocessing step. These are lessons the football CV community learned expensively through component-level benchmarks that turned out to be poor predictors of production system quality.
The SoccerNet benchmark framing the goal as "democratising access to game state data for all leagues" is not incidental. The research concentration in football exists partly because the commercial payoff of making structured match data accessible to clubs that have never had it is large. That same opportunity exists in every team sport. Football shows what is technically possible. The next wave of sports CV products applies those methods to the rest of the game.
The next post in this series looks at where this pipeline actually breaks in practice: the broadcast camera problem, and the occlusion, motion blur, and viewpoint constraints that define the real engineering challenge of sports CV. If you are evaluating what AI can extract from your sports video, our Sports Performance playbook covers the assessment frameworks and engagement models for organisations ready to build.