2025-11-06
·Dataset Preparation for Push-off Movement Detection via ResNet18/34
·Performance Analysis
·7 min read
Details
Video Footage of: Joel Brigh-Davies, Jude Birght-Davies, Joshua Chunye, Simonetta Ifeji
Advisor: David Clement Johnson
Copyright: 2025 simeng.dev. All Rights Reserved.
1. Introduction
This article documents the approach currently taken to prepare a dataset for training a model for push-off movement identification in triple jump.
Before estimating the distance of triple jump phases, specific movements must be identified during the athlete’s attempt. In particular, the push-off movement should be detected. In this context, the term “push-off” is used to refer to the action taking place just prior to takeoff, where the athlete is still making ground contact.

The goal is to store left/right foot sequence in an array upon identification. If successful, by storing the foot sequence and coordinates of the push-off events, it may be possible to recognise the exercise performed as well as whether or not the movement resembles a triple jump foot sequence.
The goal is to use the information available to estimate the distance between two push-off movements, provided the camera calibration value is known.
First, dataset preparation is discussed, including collection and video frame annotation. Subsequently, model selection is discussed as well as the options considered. Finally, the next steps are outlined.
2. Dataset Preparation
This method involves recording five bounding exercises:
-
Triple: L, L, R or R, L, L, etc..
-
Hop-Steps: L, L, R, R, L, L, R, R etc..
-
2 Hops 2 Steps: L, L, L, R, L, L, L (or R, R, R, L, R, R, R)
-
Steps: L, R, L, R, L, R, etc..
-
Hops: L, L, L etc.. or R, R, R etc…
Where L is left foot contact and R is right foot contact. The aforementioned bounding exercises are performed during plyometric winter training by the athletes whose videos are recorded for this project.
While the primary objective is to detect triple jump sequences regardless of whether they are performed from a standing position or with a run-up, it is useful to record different foot patterns since each athlete may hop on either leg.
Videos were recorded using:
- GoPro Hero13
- Tripod with adjustable height (max height at least 1m20).
2.1 Data Collection
The videos are recorded at 2.7K resolution and 240 frames per second. High resolution and high frames per second are required for analysis to ensure that there is a clear view of the foot movement. The height was set to 50-70 cm depending on the location of the tripod.
The GoPro Hero13 enables replaying videos at standard playback speed or in slow motion. The videos presented in the following sections are displayed at 1/8 speed and capture only a portion of the 25m exercise due to the presence of only one camera.
NOTE: The videos in this article are a cropped version of the original recording.
2.1.1 Hops
The hop is the first phase of triple jump. The hop exercises recorded require the athlete to jump on the same foot until reaching the sand.
Video 1: Left hops on the runway.
Video 2: Landing on the sand after right hop bounding.
2.1.2 Steps
Steps are a type of bounding where the foot pattern is L, R, L, R. The step is the second phase of the triple jump.
Video 3: Steps on the runway (i.e. L, R, L, R etc..)
2.1.3 Hop-Step-Hop-Step
Hop-Step-Hop-Step exercises are a continuous triple jump movement with an alternating pattern of L, L, R, R, L, L, R, R until the athlete reaches the sand.
Video 4: Hop, step, hop, step pattern (L, L, R, R, L, L, R, R) on the runway.
2.2 Video Frame Annotation
After videos are recorded, frames are extracted and labelled. For this project, the open source labelling platform Label Studio is used.
Using a data labelling platform is best practice, in particular when working on a scalable project, it ensures accuracy and flexibility. A high-quality dataset ensures a more accurate model.
The labels used are push_off and not_push_off. The first refers to video frames showing push-off movements, i.e., the action just prior to takeoff. The second refers to everything else taking place during the jumping sequence, such as takeoff, flight phase, and landing.
Video 5: Push-off to takeoff sequence showing transition from ground contact to airborne phase.
Note that in the first instance, running approach frames are not included in the labelling process. However, there is an argument to be made that the model might not be able to distinguish between push-off events and push movements the athlete performs during the approach. In that case, videos of the approach will be recorded and included in the dataset.
Below is an example of a short run-up approach and takeoff exercise over a hurdle. The difference between approach push movements and push-off events is noticeable.
Once the data are labelled they can be exported from Label Studio. For documentation on how to export from the platform, follow this link: https://labelstud.io/guide/export.
Alternative: Organise Files into Folders
Alternatively, two folders can be used to store frames: push and not_push
├── labelled_data/
│ ├── push_off/
│ └── not_push_off/
├── datasets/
│ ├── train/
│ ├── val/
│ └── test/
├── models/
└── checkpoints/
This method is not easily scalable as the dataset increases in size.
While the dataset for this project is small compared to the CIFAR-10 dataset, as more videos are recorded, having the correct procedure in place makes it easier to track progress.
Note: LabelImg graphical image annotation tool has been discontinued and the GitHub repository has been archived. LabelImg is now part of Label Studio.
3. Model Selection
This is a binary classification problem in which the athlete is either pushing off and subsequently taking off, or not.
Therefore, push_off data maps to 1 and not_push_off data maps to 0. Because the dataset is being collected during training sessions rather than using historical data, the datasets will be small to begin with.
Convolutional Neural Networks (ConvNets) are ideal for image classification problems. In particular, ResNet50 is very popular. ResNet50 is a convolutional neural network 50 layers deep.
Previous work on image classification using ResNet50 and the CIFAR-10 dataset was submitted to the University of Cambridge Huawei ML Challenge in 2021.
The submission received a Wolfram Award and 1st Prize for the Machine Learning challenge. It addressed a similar classification problem, though it focused on multi-class classification rather than binary classification. The project mentioned can be found here.
Because of the size of the dataset and limited resources available, ResNet18 and ResNet34 are considered in the first instance.
4. Next Steps
The next few months will focus on data acquisition and data annotation. To ensure a robust ML workflow, cloud-based infrastructure will be set up to allow for dataset storage and retrieval. More details will be provided in the next documentation.
References
He, K., Zhang, X., Ren, S. and Sun, J. (2015) ‘Deep Residual Learning for Image Recognition’, CoRR, abs/1512.03385. Available at: http://arxiv.org/abs/1512.03385 (Accessed: 02/11/2025).
Topics
- Dataset Preparation
- Video Frame Annotation
Tech Stack
- Python
- ResNet18/34
- Label Studio