I took a deadlift video from r/formcheck, fed it to a Python script, and got back prioritized coaching cues. All running locally on my laptop. No cloud APIs, no subscriptions. Just MediaPipe for pose estimation, OpenCV for frame extraction, and Qwen 3.5-9B running on llama.cpp for natural-language feedback.

The Idea

r/formcheck is a subreddit where lifters post videos of their sets and ask for feedback. You film your deadlift, upload it, and wait for someone (hopefully qualified) to tell you what to fix. The problem is that feedback is inconsistent, slow, and often contradictory.

I wanted to see if I could automate this: take a video from the subreddit, extract the lifter’s pose frame-by-frame, compute the biomechanical metrics that matter, and feed those numbers to a local LLM that generates actionable coaching cues.

The Pipeline

The analyzer runs in five steps:

  1. Extract frames from the video at 10 fps using OpenCV
  2. Run pose estimation on each frame with MediaPipe Pose (33 3D keypoints per frame)
  3. Compute biomechanics joint angles, torso lean, bar path drift, lockout detection
  4. Save annotated stills for any flagged moments, with the pose skeleton overlaid
  5. Generate coaching feedback by sending the metrics to a local Qwen 3.5-9B model
python analyze.py deadlift.mp4

Pose Estimation with MediaPipe

MediaPipe Pose provides 33 3D keypoints per frame. The keypoints include shoulders, hips, knees, ankles, wrists, and more. It runs in image mode, processing each extracted frame independently:

options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=MODEL_PATH),
    running_mode=RunningMode.IMAGE,
    num_poses=1,
    min_pose_detection_confidence=0.5,
    min_tracking_confidence=0.5,
)

with PoseLandmarker.create_from_options(options) as landmarker:
    for frame in frames:
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb)
        result = landmarker.detect(mp_image)

Each landmark gives an (x, y, z) coordinate in normalized image space, which I scale to pixel coordinates. Frames where no pose is detected are silently dropped.

I initially considered SAM 3D Body, which provides 70 keypoints per frame, but it requires detectron2, access-gated HuggingFace checkpoints, and a complex conda install. MediaPipe is good enough for a POC and runs in seconds.

Biomechanics

With the keypoints extracted, I compute five metrics that matter for deadlift form:

MetricWhat it measures
Hip angleShoulder → Hip → Knee angle at setup, first pull, and lockout
Knee angleHip → Knee → Ankle angle at bottom and lockout
Torso leanAngle between the torso vector and vertical at first pull
Bar path driftMax horizontal displacement of the wrist keypoints across frames
Lockout checkWhether the hip angle reaches ≥ 165° at the top

Key Moment Detection

Rather than hard-coding frame indices, the analyzer identifies key moments from the angle curves:

  • Setup - first frame
  • First pull - frame with minimum hip angle (deepest flexion)
  • Knee pass - first frame after first pull where the knee angle exceeds 140°
  • Lockout - frame with maximum hip angle (most extended)

This makes the analysis robust to videos of different lengths and starting positions.

Flags

The system triggers flags when metrics cross specific thresholds:

FlagCondition
excessive_torso_leanTorso lean > 45° at first pull
incomplete_lockoutHip angle < 165° at top
bar_driftLateral wrist drift > 50 px

Each triggered flag generates an annotated still - the original video frame with the pose skeleton and the metric value overlaid:

Excessive Torso Lean (first pull)Bar Path Drift
Excessive torso leanBar drift

Coaching Feedback with a Local LLM

The final step sends the computed metrics to Qwen 3.5-9B, running locally via llama-server. I covered how to set this up in a previous post.

The prompt is structured to produce the kind of feedback you’d get from a coach:

SYSTEM_PROMPT = (
    "You are an experienced powerlifting coach reviewing a deadlift. "
    "You will be given biomechanical data extracted from a video analysis. "
    "Provide 3-5 coaching cues, ordered from most to least critical. "
    "Be specific and actionable. Use plain language a gym-goer would understand. "
    "Do not mention that you are an AI or that data was extracted from a video."
)

The user prompt is built from the analysis dict - hip angles, knee angles, torso lean, bar drift, and flags - all formatted as a structured data block. The LLM then produces natural-language cues.

Example Output

Here’s what the analyzer produced from a real deadlift video:

1. Get your hips up higher before you start pulling.
   Your setup shows your hips are too low (119.4°), which forces
   you to slide forward excessively during the first pull.

2. Keep your chest up and pull the bar close to your shins.
   You are leaning forward too much (61.5° torso lean), which
   creates leverage issues.

3. Finish the lift by snapping your hips back and standing tall.
   Even though you locked out, your numbers suggest you might
   be losing momentum before the very top.

4. Watch your bar path stay over your mid-foot.
   You are drifting laterally, which means the bar is moving
   away from your center of mass.

The feedback is specific to the lifter’s actual numbers - not generic advice.

Stack Summary

ComponentRole
OpenCVFrame extraction at 10 fps
MediaPipe Pose33 3D keypoints per frame
NumPyAngle computation and signal processing
Qwen 3.5-9B (GGUF)Natural-language coaching via local llama-server

Everything runs locally. The video never leaves the machine, and the LLM inference is done on-device with a 4-bit quantized model.

What’s Next

This is a proof of concept. Some improvement possibilities:

  • Temporal smoothing - MediaPipe keypoints can jitter between frames; a Kalman filter or moving average would clean up the angle curves
  • SAM 3D Body - Swapping in Facebook’s 70-keypoint model would give more granular analysis, especially for shoulder and spine mechanics
  • Multi-rep detection - Automatically segment the video into individual reps and analyze each one
  • Visual feedback - Overlay the coaching cues directly onto the video as an annotated clip
  • LLM-driven flagging - The current system uses hard-coded thresholds (e.g. torso lean > 45°) to decide what’s wrong. These don’t account for body proportions or lift style. It would be interesting to drop the flags entirely and let the LLM interpret the raw metrics, so it could weigh multiple factors together rather than triggering on arbitrary cutoffs