I took a deadlift video from r/formcheck, fed it to a Python script, and got back prioritized coaching cues. All running locally on my laptop. No cloud APIs, no subscriptions. Just MediaPipe for pose estimation, OpenCV for frame extraction, and Qwen 3.5-9B running on llama.cpp for natural-language feedback.
The Idea
r/formcheck is a subreddit where lifters post videos of their sets and ask for feedback. You film your deadlift, upload it, and wait for someone (hopefully qualified) to tell you what to fix. The problem is that feedback is inconsistent, slow, and often contradictory.
I wanted to see if I could automate this: take a video from the subreddit, extract the lifter’s pose frame-by-frame, compute the biomechanical metrics that matter, and feed those numbers to a local LLM that generates actionable coaching cues.
The Pipeline
The analyzer runs in five steps:
- Extract frames from the video at 10 fps using OpenCV
- Run pose estimation on each frame with MediaPipe Pose (33 3D keypoints per frame)
- Compute biomechanics joint angles, torso lean, bar path drift, lockout detection
- Save annotated stills for any flagged moments, with the pose skeleton overlaid
- Generate coaching feedback by sending the metrics to a local Qwen 3.5-9B model
python analyze.py deadlift.mp4
Pose Estimation with MediaPipe
MediaPipe Pose provides 33 3D keypoints per frame. The keypoints include shoulders, hips, knees, ankles, wrists, and more. It runs in image mode, processing each extracted frame independently:
options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path=MODEL_PATH),
running_mode=RunningMode.IMAGE,
num_poses=1,
min_pose_detection_confidence=0.5,
min_tracking_confidence=0.5,
)
with PoseLandmarker.create_from_options(options) as landmarker:
for frame in frames:
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb)
result = landmarker.detect(mp_image)
Each landmark gives an (x, y, z) coordinate in normalized image space, which I scale to pixel coordinates. Frames where no pose is detected are silently dropped.
I initially considered SAM 3D Body, which provides 70 keypoints per frame, but it requires detectron2, access-gated HuggingFace checkpoints, and a complex conda install. MediaPipe is good enough for a POC and runs in seconds.
Biomechanics
With the keypoints extracted, I compute five metrics that matter for deadlift form:
| Metric | What it measures |
|---|---|
| Hip angle | Shoulder → Hip → Knee angle at setup, first pull, and lockout |
| Knee angle | Hip → Knee → Ankle angle at bottom and lockout |
| Torso lean | Angle between the torso vector and vertical at first pull |
| Bar path drift | Max horizontal displacement of the wrist keypoints across frames |
| Lockout check | Whether the hip angle reaches ≥ 165° at the top |
Key Moment Detection
Rather than hard-coding frame indices, the analyzer identifies key moments from the angle curves:
- Setup - first frame
- First pull - frame with minimum hip angle (deepest flexion)
- Knee pass - first frame after first pull where the knee angle exceeds 140°
- Lockout - frame with maximum hip angle (most extended)
This makes the analysis robust to videos of different lengths and starting positions.
Flags
The system triggers flags when metrics cross specific thresholds:
| Flag | Condition |
|---|---|
excessive_torso_lean | Torso lean > 45° at first pull |
incomplete_lockout | Hip angle < 165° at top |
bar_drift | Lateral wrist drift > 50 px |
Each triggered flag generates an annotated still - the original video frame with the pose skeleton and the metric value overlaid:
| Excessive Torso Lean (first pull) | Bar Path Drift |
|---|---|
![]() | ![]() |
Coaching Feedback with a Local LLM
The final step sends the computed metrics to Qwen 3.5-9B, running locally via llama-server. I covered how to set this up in a previous post.
The prompt is structured to produce the kind of feedback you’d get from a coach:
SYSTEM_PROMPT = (
"You are an experienced powerlifting coach reviewing a deadlift. "
"You will be given biomechanical data extracted from a video analysis. "
"Provide 3-5 coaching cues, ordered from most to least critical. "
"Be specific and actionable. Use plain language a gym-goer would understand. "
"Do not mention that you are an AI or that data was extracted from a video."
)
The user prompt is built from the analysis dict - hip angles, knee angles, torso lean, bar drift, and flags - all formatted as a structured data block. The LLM then produces natural-language cues.
Example Output
Here’s what the analyzer produced from a real deadlift video:
1. Get your hips up higher before you start pulling.
Your setup shows your hips are too low (119.4°), which forces
you to slide forward excessively during the first pull.
2. Keep your chest up and pull the bar close to your shins.
You are leaning forward too much (61.5° torso lean), which
creates leverage issues.
3. Finish the lift by snapping your hips back and standing tall.
Even though you locked out, your numbers suggest you might
be losing momentum before the very top.
4. Watch your bar path stay over your mid-foot.
You are drifting laterally, which means the bar is moving
away from your center of mass.
The feedback is specific to the lifter’s actual numbers - not generic advice.
Stack Summary
| Component | Role |
|---|---|
| OpenCV | Frame extraction at 10 fps |
| MediaPipe Pose | 33 3D keypoints per frame |
| NumPy | Angle computation and signal processing |
| Qwen 3.5-9B (GGUF) | Natural-language coaching via local llama-server |
Everything runs locally. The video never leaves the machine, and the LLM inference is done on-device with a 4-bit quantized model.
What’s Next
This is a proof of concept. Some improvement possibilities:
- Temporal smoothing - MediaPipe keypoints can jitter between frames; a Kalman filter or moving average would clean up the angle curves
- SAM 3D Body - Swapping in Facebook’s 70-keypoint model would give more granular analysis, especially for shoulder and spine mechanics
- Multi-rep detection - Automatically segment the video into individual reps and analyze each one
- Visual feedback - Overlay the coaching cues directly onto the video as an annotated clip
- LLM-driven flagging - The current system uses hard-coded thresholds (e.g. torso lean > 45°) to decide what’s wrong. These don’t account for body proportions or lift style. It would be interesting to drop the flags entirely and let the LLM interpret the raw metrics, so it could weigh multiple factors together rather than triggering on arbitrary cutoffs

