Building HevyDuty AI: AI Workout Builder Case Study

Most people who lift weights seriously have experienced the same frustration. You want a training program that matches your goals, experience, schedule, and equipment, but designing one yourself requires knowledge of exercise science, volume periodization, and recovery management. Generic programs from fitness magazines ignore your individual context. Personal trainers cost serious money for something you need updated every few weeks.

The insight behind HevyDuty was that an LLM with the right constraints could generate workout programs that are genuinely useful for intermediate and advanced lifters, not just random exercise lists. The key word is constraints. An unconstrained LLM will happily prescribe Olympic lifts to a beginner or assign 40 sets of chest work in a single session. The entire challenge was building a prompt and validation pipeline that produces safe, balanced, and progressive programs.

The target user was someone already tracking workouts in the Hevy app who wanted AI-generated programs that slot directly into their existing workflow. Hevy has a robust API and a loyal user base, so deep integration with Hevy's ecosystem was a priority from day one.

The first version of HevyDuty was built in a single weekend using an AI coding assistant. The prototype was a React app with a simple form: select a goal, pick your days, and hit generate. It sent a basic prompt to the Gemini API and rendered the raw response as formatted text. No JSON parsing, no Hevy integration, no weight suggestions. Just a text dump of a workout plan.

Even this rough prototype validated the core idea. The generated programs were surprisingly reasonable for intermediate lifters. Exercise selection made sense, volume distribution was adequate, and the programs accounted for recovery between sessions. But the gap between "surprisingly reasonable text" and "a tool people would actually use to train" was enormous. Users needed the output in a format their tracking app could consume, with actual weight recommendations based on their strength levels.

The vibecoding approach was valuable for rapid validation but left behind significant technical debt. The AI assistant had generated components with duplicated state logic, inconsistent error handling, and no tests. The refactoring phase took longer than the initial build, which is a pattern we have seen repeatedly: vibecoding is fastest for prototypes and most expensive when you skip the cleanup.

Moving from raw text output to structured JSON was the first major engineering effort. The system prompt grew from a single paragraph to over 400 words, specifying the exact JSON schema, volume guidelines per muscle group, exercise safety constraints by experience level, and rules for set and rep scheme selection based on the training goal.

We use Gemini's structured output mode with response_mime_type set to application/json, which significantly reduces formatting errors. The prompt includes chain-of-thought instructions that guide the model through volume distribution before exercise selection, preventing the common failure mode where the model picks popular exercises first and then struggles to balance the overall volume.

The free tier uses Gemini Flash for speed, generating a full multi-day program in 3 to 5 seconds. Users who bring their own API key can select Gemini Pro for more nuanced exercise programming, particularly for advanced techniques like wave loading, cluster sets, and periodized mesocycles. The quality difference is noticeable for experienced lifters who can evaluate exercise selection and volume prescription critically.

Tip: Gemini Flash generates a complete multi-day workout program in 3-5 seconds. The structured output mode with JSON mime type reduced parsing failures by over 95%.

Raw workout programs without weight recommendations are only half useful. A program that says "Bench Press 4x8" is useless until you know what weight to load. HevyDuty solves this by pulling the user's exercise history from Hevy and calculating suggested starting weights for every prescribed exercise.

The algorithm works in three stages. First, it fetches the user's recent workout history through the Hevy API, extracting the most recent weight used for each exercise. Second, it applies a time-decay function: if you last performed an exercise two weeks ago, the suggestion is your last weight. If it was three months ago, the suggestion drops by 10-15% to account for detraining. If you have never performed the exercise, HevyDuty estimates a starting weight based on related exercises and the user's bodyweight.

Third, the algorithm adjusts for the prescribed rep range. If your last bench press session was 4 sets of 5 at 100kg, and the new program prescribes 4 sets of 12, the weight suggestion drops accordingly using a simplified rep-max formula. These adjustments are conservative by design. It is always better to suggest slightly too light than slightly too heavy, especially for a tool that beginners might rely on without a spotter.

Hevy exposes a REST API that allows reading workout history, creating routines, and managing exercises. HevyDuty uses this API bidirectionally: reading history for weight suggestions and writing generated routines back to Hevy for the user to execute during their training sessions.

The integration required mapping HevyDuty's internal exercise names to Hevy's exercise IDs. This is harder than it sounds because exercise naming is not standardized. HevyDuty might generate "Dumbbell Romanian Deadlift" while Hevy's database calls it "Romanian Deadlift (Dumbbell)." We maintain a mapping table with fuzzy matching as a fallback, and when a match cannot be found, the user can manually map the exercise through a search interface.

Routine upload sends the complete program to Hevy with per-set weight targets. Users open the Hevy app and see their new routine ready to go, with every set pre-populated with the suggested weight. The friction reduction from "AI generates a program" to "I'm in the gym training it" went from multiple manual steps to a single button click, which was the integration that users cited most often as the reason they kept using HevyDuty.

HevyDuty started as a React web app and PWA, but gym environments are hostile to web apps. Spotty WiFi, notifications from other apps competing for attention, and the general expectation that fitness tools are native apps. We used Capacitor to wrap the existing React app into a native Android application with minimal code changes.

Capacitor's approach of running your web app inside a native WebView meant we could share 98% of the codebase. The only platform-specific code handles native storage for API keys (using Capacitor Preferences instead of localStorage for better security), haptic feedback on button presses, and the native splash screen. The build pipeline uses a GitHub Action that runs the web build, syncs it to the Capacitor Android project, and produces a signed APK.

Performance in the WebView is indistinguishable from the browser for this type of app. HevyDuty is not rendering 60fps animations or processing real-time sensor data. It is displaying forms, making API calls, and rendering tables. The WebView handles this without any perceptible lag, and users have never reported performance issues that were attributable to the non-native rendering layer.

The biggest performance bottleneck was not AI generation time but the initial load of the user's exercise history from Hevy. Some power users have thousands of logged workouts spanning years. Fetching this data on every session was slow and wasteful. We implemented a caching layer that stores the last-fetched history in IndexedDB and only requests new workouts since the last sync. This reduced the typical history load from 4-6 seconds to under 500 milliseconds.

AI response streaming improved perceived performance significantly. Instead of waiting for the complete JSON response and then rendering it all at once, we stream the response and show a progress indicator with the current generation stage. The actual generation time did not change, but user satisfaction with the wait improved measurably because they could see the AI working rather than staring at a spinner.

The most requested feature after launch was exercise swapping. Users liked the overall program structure but wanted to substitute specific exercises due to equipment availability or personal preference. We added an AI-powered swap feature where the user taps an exercise and the model suggests 3-5 alternatives that target the same muscle groups with similar biomechanics, maintaining the program's balance.

Unexpected feedback came from physiotherapy patients who used HevyDuty to generate rehabilitation-oriented programs. The safety constraints we built for beginners happened to work well for recovery contexts, suggesting low-impact movements with conservative volumes. This opened a use case we had not anticipated and informed subsequent prompt improvements to better support rehabilitation goals.

Six months after launch, HevyDuty's core architecture has remained stable. The prompt has been revised over 30 times, the weight suggestion algorithm has been tuned based on user reports of recommendations being too aggressive for certain exercise categories, and the Hevy exercise mapping table grows with every new exercise the model generates. The lesson is that shipping the AI feature is 20% of the work. Tuning it based on real-world usage is the other 80%.

Building HevyDuty AI: AI Workout Builder Case Study

Related Articles

How Plokk Was Built: Concept to Microsoft Store in 6 Weeks

SimplBiz: AI Receipt Scanning on Google Drive

EdImport: Automating Slovenian Tax Filing for Foreign Investors

Explore More