Spicing up Ice Hockey with AI: Player Tracking with Computer Vision

Author:Murphy | View: 20807 | Time: 2025-03-22 20:55:09

Nowadays, I don't play hockey as much as I want to, but it's been a part of me since I was a kid. Recently, I had the chance to help with the referee table and keep some stats in the first Ice Hockey Tournament in Lima (3 on 3). This event involved an extraordinary effort of the the Peruvian Inline Hockey Association (APHL) and a kind visit from the Friendship League. To add an AI twist, I used PyTorch, Computer Vision techniques, and a Convolutional Neural Network (CNN) to build a model that tracks players and teams and gathers some basic performance stats.

This article aims to be a quick guide to designing and deploying the model. Although the model still needs some fine-tuning, I hope it can help anyone introduce themselves to the interesting world of computer vision applied to sports. I would like to acknowledge and thank the Peruvian Inline Hockey Association (APHL) for allowing me to use a 40-second video sample of the tournament for this project _(you can find the video input sample in the project's GitHub repository)._

The Architecture

Before moving on with the project, I did some quick research to find a baseline from which I could work and avoid "reinventing the wheel". I found that in terms of using computer vision to track players, there is a lot of interesting work on football (not surprising, being the most popular team sport in the world). However, I didn't find many resources for ice hockey. Roboflow has some interesting pre-trained models and datasets for training your own, but working with a hosted model presented some latency issues that I will explain further. In the end, I leveraged the soccer material for reading the video frames and obtaining the individual track IDs, following the basic principles and tracking method approach explained in this tutorial (If you are interested in gaining a better understanding of some basic computer vision techniques, I suggest watching at least the first hour and a half of the tutorial).

With the tracking IDs covered, I then built my own path. As we walk through this article, we'll see how the project evolves from a simple object detection task to a model that fully detects players, teams, and delivers some basic performance metrics (sample clips from 01 to 08, author's own creation).

Model Architecture. Author's own creation

The Tracking Mechanism

The tracking mechanism is the backbone of the model. It ensures that each detected object within the video is identified and assigned a unique identifier, maintaining this identity across each frame. The main components of the tracking mechanism are:

YOLO (You Only Look Once): It's a powerful real-time object detection algorithm originally introduced in 2015 in the paper "You Only Look Once: Unified, Real-Time Object Detection". Stands out for its speed and its versatility in detecting around 80 pre-trained classes (it's important to note that it can also be trained on custom datasets to detect specific objects). For our use case, we will rely on YOLOv8x, a computer vision model built by Ultralytics based on previous YOLO versions. You can download it here.
ByteTrack Tracker: To understand ByteTrack, we have to understand MOT (Multiple Object Tracking), which involves tracking the movements of multiple objects over time in a video sequence and linking those objects detected in a current frame with corresponding objects in previous frames. To accomplish this, we will use ByteTrack ( introduced in 2021 in the paper "ByteTrack: Multi-Object Tracking by Associating Every Detection Box"). To implement the ByteTrack tracker and assign track IDs to detected objects, we will rely on the Python's supervision library.
OpenCV: is a well-known library for various computer vision tasks in Python. For our use case, we will rely on OpenCV to visualize and annotate video frames with bounding boxes and text for each detected object.

In order to build our tracking mechanism, we'll begin with these initial two steps:

Deploying the YOLO model with ByteTrack to detect objects (in our case, players) and assign unique track IDs.
Initializing a dictionary to store object tracks in a pickle (pkl) file. This will be extremely useful to avoid executing the video frame-by-frame object detection process each time we run the code, and save significant time.

For the following step, these are the Python packages that we'll need:

pip install ultralytics
pip install supervision
pip install opencv-python

Next, we'll specify our libraries and the path for our sample video file and pickle file (if it exists; if not, the code will create one and save it in the same path):

#**********************************LIBRARIES*********************************#
from ultralytics import YOLO
import supervision as sv
import pickle
import os
import cv2

# INPUT-video file
video_path = 'D:/PYTHON/video_input.mp4'
# OUTPUT-Video File
output_video_path = 'D:/PYTHON/output_video.mp4'
# PICKLE FILE (IF AVAILABLE LOADS IT IF NOT, SAVES IT IN THIS PATH)
pickle_path = 'D:/PYTHON/stubs/track_stubs.pkl'

Now let's go ahead and define our tracking mechanism _(you can find the video input sample in the project's GitHub repository)_:

#*********************************TRACKING MECHANISM**************************#
class HockeyAnalyzer:
    def __init__(self, model_path):
        self.model = YOLO(model_path) 
        self.tracker = sv.ByteTrack()

    def detect_frames(self, frames):
        batch_size = 20 
        detections = [] 
        for i in range(0, len(frames), batch_size):
            detections_batch = self.model.predict(frames[i:i+batch_size], conf=0.1)
            detections += detections_batch
        return detections

#********LOAD TRACKS FROM FILE OR DETECT OBJECTS-SAVES PICKLE FILE************#

    def get_object_tracks(self, frames, read_from_stub=False, stub_path=None):
        if read_from_stub and stub_path is not None and os.path.exists(stub_path):
            with open(stub_path, 'rb') as f:
                tracks = pickle.load(f)
            return tracks

        detections = self.detect_frames(frames)

        tracks = {"person": []}

        for frame_num, detection in enumerate(detections):
            cls_names = detection.names
            cls_names_inv = {v: k for k, v in cls_names.items()}

            # Tracking Mechanism
            detection_supervision = sv.Detections.from_ultralytics(detection)
            detection_with_tracks = self.tracker.update_with_detections(detection_supervision)
            tracks["person"].append({})

            for frame_detection in detection_with_tracks:
                bbox = frame_detection[0].tolist()
                cls_id = frame_detection[3]
                track_id = frame_detection[4]

                if cls_id == cls_names_inv.get('person', None):
                    tracks["person"][frame_num][track_id] = {"bbox": bbox}

            for frame_detection in detection_supervision:
                bbox = frame_detection[0].tolist()
                cls_id = frame_detection[3]

        if stub_path is not None:
            with open(stub_path, 'wb') as f:
                pickle.dump(tracks, f)

        return tracks

#***********************BOUNDING BOXES AND TRACK-IDs**************************#

    def draw_annotations(self, video_frames, tracks):
        output_video_frames = []
        for frame_num, frame in enumerate(video_frames):
            frame = frame.copy() 
            player_dict = tracks["person"][frame_num]

            # Draw Players
            for track_id, player in player_dict.items():
                color = player.get("team_color", (0, 0, 255))  
                bbox = player["bbox"]
                x1, y1, x2, y2 = map(int, bbox)         
            # Bounding boxes
                cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
            # Track_id 
                cv2.putText(frame, str(track_id), (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)

            output_video_frames.append(frame)

        return output_video_frames

The method begins by initializing the YOLO model and the ByteTrack tracker. Next, each frame is processed in batches of 20, using the YOLO model to detect and collect objects in each batch. If the pickle file is available in its path, it precomputes the tracks from the file. If the pickle file is not available (you are running the code for the first time or have erased a previous pickle file), the get_object_tracks converts each detection into the required format for ByteTrack, updates the tracker with these detections, and stores the tracking information in a new pickle file in the designated path.Finally, iterations are made over each frame, drawing bounding boxes and track IDs for each detected object.

To execute the tracker and save a new output video with bounding boxes and track IDs, you can use the following code:

#*************** EXECUTES TRACKING MECHANISM AND OUTPUT VIDEO****************#

# Read the video frames
video_frames = []
cap = cv2.VideoCapture(video_path)
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    video_frames.append(frame)
cap.release()

#********************* EXECUTE TRACKING METHOD WITH YOLO**********************#
tracker = HockeyAnalyzer('D:/PYTHON/yolov8x.pt')
tracks = tracker.get_object_tracks(video_frames, read_from_stub=True, stub_path=pickle_path)
annotated_frames = tracker.draw_annotations(video_frames, tracks)

#*********************** SAVES VIDEO FILE ************************************#
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
height, width, _ = annotated_frames[0].shape
out = cv2.VideoWriter(output_video_path, fourcc, 30, (width, height))

for frame in annotated_frames:
    out.write(frame)
out.release()

If everything in your code worked correctly, you should expect a video output similar to the one shown in sample clip 01.

Sample Clip 01: Basic tracking mechanism ( Objects and Tracking IDs)

TIP #01: Don't underestimate your compute power! When running the code for the first time, expect the frame processing to take some time, depending on your compute capacity. For me, it took between 45 to 50 minutes using only a CPU setup (consider CUDA as an option). The YOLOv8x tracking mechanism, while powerful, demands significant compute resources (at times, my memory hit 99%, fingers crossed it didn't crash!
Tags: Artificial Intelligence Computer Vision Deep Dives Machine Learning Neural Networks

Add Fav