How to Build an AI-Powered Web App: A Practical Architecture Guide

The term "AI-powered" gets thrown around loosely, but for web applications it has a specific technical meaning: your app sends data to a large language model (or other ML model), receives structured output, and uses that output to drive functionality that would be difficult or impossible to implement with traditional code. This is fundamentally different from traditional programming, where every behavior is explicitly coded. An AI-powered feature handles ambiguous, unstructured input — natural language, images, documents — and produces structured, actionable output. The AI is not decorative; it performs a function that your app could not perform without it.

Consider the difference between a conventional expense tracker and SimplBiz. A conventional tracker requires the user to manually type the amount, date, vendor, and category for every expense. SimplBiz lets you photograph a receipt, sends the image to Gemini's vision model, and receives back structured JSON with the extracted fields. The user experience is transformed: one tap instead of thirty seconds of typing. But behind that simplicity is an architectural decision about where the AI runs, how it is called, how errors are handled, and how costs are managed. These decisions determine whether your AI feature delights users or frustrates them.

Not every app needs AI, and adding it where it is not needed creates complexity without value. Before integrating an AI model, ask yourself: is this task fundamentally about understanding unstructured data, generating creative content, or making decisions that require contextual reasoning? If the answer is yes, AI is the right tool. If the task can be solved with conditional logic, regular expressions, or a lookup table, traditional code will be faster, cheaper, and more reliable. The best AI-powered apps use AI surgically — for the specific features that benefit from it — while keeping everything else conventional.

The first architectural decision is whether AI calls happen in the user's browser (client-side) or on your server (server-side). Client-side integration means your frontend JavaScript calls the AI API directly. This is simpler to implement — no backend required — but it means your API key must be accessible to the client, which is a security risk. Even if you restrict the key's permissions, anyone who opens the browser DevTools can extract it and run up your bill. Client-side integration works for personal tools, prototypes, and apps where the user provides their own API key (as HevyDuty does for its power-user tier), but it is generally not suitable for production apps with public users.

Server-side integration routes AI requests through your backend. The user's browser sends a request to your API endpoint, your server adds the API key and calls the AI provider, then forwards the response back to the client. This keeps your API key secret and lets you add rate limiting, request validation, input sanitization, and usage tracking between the user and the AI model. The tradeoff is that you need a backend service — a Cloud Function, an edge function, or a traditional server. For Firebase-based apps, a Cloud Function is the natural choice; for Vercel-based apps, an API route or edge function works well.

A hybrid approach uses server-side calls for features available to all users (where you absorb the AI cost) and client-side calls with user-provided keys for power users who want more control. HevyDuty AI uses this pattern: the free tier calls Gemini Flash through a Firebase Cloud Function with rate limiting, while users who provide their own Gemini Pro API key make direct client-side calls for enhanced workout generation. This balances security with flexibility and lets you offer a free tier without unlimited cost exposure.

The major AI API providers in 2026 are Google (Gemini), Anthropic (Claude), and OpenAI (GPT). Each has strengths that matter for different use cases. Gemini offers the best price-to-performance ratio for most web app integrations, especially with the Flash model, which is extremely fast and cheap. Gemini also has native multimodal capabilities — it handles images, audio, and video natively, which makes it the best choice for features like receipt scanning (SimplBiz) and image analysis. The API is straightforward, and the Google AI SDK for JavaScript is well-maintained.

Claude excels at tasks requiring careful reasoning, long-context understanding, and instruction following. If your AI feature involves analyzing long documents, maintaining complex conversational state, or following detailed system prompts precisely, Claude is often the better choice. It is also the strongest model for code generation tasks, which matters if your app generates code for users. OpenAI's GPT models remain the most widely used and have the broadest ecosystem of tools, wrappers, and tutorials, which means troubleshooting is easier because more people have encountered your exact problem.

For most vibecoded web apps, the recommendation is to start with Gemini Flash for cost-sensitive features (receipt scanning, simple classification, data extraction) and use Claude or GPT-4 for features that require sophisticated reasoning. All three providers offer compatible API patterns (send a message, get a response), so switching between them is straightforward. Use environment variables for the API key and model name, so you can swap providers by changing configuration rather than code. We do this at RatataLabs: HevyDuty can switch between Gemini Flash and Gemini Pro with a single configuration change, which lets us test model upgrades without touching the codebase.

API key security is the area where vibecoded apps fail most often. AI tools frequently generate code that puts API keys directly in the frontend source code, in .env files that get committed to GitHub, or in client-side configuration objects. Every one of these patterns leads to key exposure, unexpected bills, and potential abuse. The rule is absolute: production API keys must never be accessible to client-side code. Store them in environment variables on your server or cloud function, and ensure your .gitignore includes .env files before your first commit.

For Firebase Cloud Functions, store API keys using Firebase environment configuration or Google Secret Manager. For Vercel, use the dashboard's environment variable settings — they are injected at build time for server-side code and never included in the client bundle. For other hosting platforms, the pattern is similar: set environment variables through the platform's UI or CLI, and reference them in your server-side code with process.env.VARIABLE_NAME. Never hardcode keys, even for development — use a .env.local file that is excluded from version control.

When your app allows users to provide their own API keys (as HevyDuty does), store them in the browser's localStorage or sessionStorage — never in your database. This keeps the keys under the user's control and ensures that a breach of your backend does not expose their credentials. Display a clear warning explaining that the key is stored locally and advise users to restrict their key's permissions. In HevyDuty, we show a privacy notice explaining that the API key never leaves the user's device and is only used for direct calls to Google's Gemini API.

Tip: If you remember one thing from this article: never put an AI API key in client-side code that you pay for. One viral Reddit post with your exposed key can cost you thousands of dollars in hours.

The prompts your app sends to the AI model at runtime are different from the prompts you use during development. Runtime prompts need to be deterministic, structured, and defensive. They must produce consistent output formats regardless of the user's input, handle edge cases gracefully, and resist injection attacks where users try to override your system prompt. The best practice is to use a system prompt that defines the AI's role, constraints, and output format, followed by a user prompt that provides the specific input for this request.

Always request structured output — JSON is the standard — and define the exact schema you expect. Instead of asking the AI to "analyze this receipt," prompt it to "extract the following fields from this receipt image and return them as JSON: { amount: number, currency: string, date: string (ISO 8601), vendor: string, category: string (one of: food, transport, office, utilities, other) }. If a field cannot be determined, use null." This level of specificity dramatically reduces parsing errors and makes your application logic predictable. Most AI providers also support response format constraints (Gemini's JSON mode, OpenAI's JSON mode) that guarantee valid JSON output.

For complex features, break the AI interaction into multiple sequential calls rather than one monolithic prompt. HevyDuty's workout generation does not happen in a single prompt. First, a call generates the program structure (which muscles on which days, exercise count per session). Then, for each session, a separate call selects specific exercises based on the user's equipment and experience level. Finally, a third call assigns sets, reps, and weights based on the user's training history. Each call is focused, produces a small structured output, and feeds into the next. This multi-step approach is more reliable than asking the AI to produce an entire complex workout in one shot.

AI API calls fail in ways that traditional API calls do not. In addition to standard network errors and rate limits, you will encounter model-specific failures: refusals (the model declines to process the input due to safety filters), malformed output (the model returns text that does not match your expected JSON schema), timeout errors (complex prompts take longer than your client's patience), and context length overflows (the input exceeds the model's maximum token limit). Your application must handle all of these gracefully.

Implement retry logic with exponential backoff for transient failures (network errors, rate limits, 500-series responses). For malformed output, attempt to parse what you received — sometimes the AI wraps valid JSON in markdown code fences or adds explanatory text around it. A robust parsing function strips common wrapper patterns before calling JSON.parse. If parsing still fails, retry the request once with a more explicit prompt. If it fails again, show the user a meaningful error: "We could not process this receipt automatically. Please enter the details manually." Always provide a manual fallback for AI-powered features.

Safety filter rejections require special handling. If a user uploads a receipt that the AI flags as inappropriate (which happens with edge cases like medical or adult-industry receipts), your app should explain the issue without blaming the user: "The AI could not process this image. This sometimes happens with certain image types. Please try a different photo or enter the details manually." Log these events for monitoring but do not store the flagged content. The general principle is: AI features should degrade gracefully to manual workflows, never to error screens.

AI API costs can escalate quickly if you do not implement controls from the start. The two variables that determine cost are the number of API calls and the token count per call. Both are directly under your control. On the call count side, implement rate limiting per user (e.g., 10 AI generations per day on the free tier), cache responses for identical inputs (if a user scans the same receipt twice, return the cached result), and avoid unnecessary calls (do not send an AI request until the user explicitly triggers it — never on page load or during typing).

On the token count side, keep your prompts concise and your input data trimmed. Do not send the AI an entire document if you only need one page analyzed. Resize images before sending them to vision models — Gemini processes a 512x512 image the same way as a 4096x4096 image, but the larger image costs more tokens and takes longer. For text inputs, extract the relevant section before sending it to the AI rather than passing the entire context. Every token you eliminate from the prompt reduces cost proportionally.

Set billing alerts and hard spending limits with your AI provider. Google Cloud lets you set budget alerts at specific thresholds; OpenAI lets you set monthly spending caps. Use these features. Also monitor your actual usage patterns — you might discover that 80% of your AI costs come from 5% of your users, which suggests that usage-based pricing or a premium tier is the right business model. At RatataLabs, HevyDuty's free tier uses Gemini Flash (which costs roughly $0.001 per workout generation) with a rate limit of a few generations per day. This keeps costs under control while providing a genuinely useful free experience.

Rate limit AI calls per user — free tiers should have daily or hourly caps
Cache responses for identical inputs to avoid redundant API calls
Resize images and trim text before sending to reduce token count
Set billing alerts and hard spending limits with your AI provider
Monitor usage patterns to identify cost-driving behavior and adjust pricing tiers

HevyDuty AI demonstrates several patterns that are applicable to any AI-powered web app. The workout generation feature uses a multi-step prompt pipeline where each step produces structured JSON that feeds into the next. The system prompt defines the AI's persona (an experienced strength coach), the constraints (safety limits based on user experience level, equipment availability), and the output format (a specific JSON schema for the workout routine). User-provided data — training history, goals, biometrics — is injected into the user prompt at runtime. The exercise swapping feature uses a separate, focused prompt that takes the current exercise and program context and returns three alternatives that maintain the program's muscle balance.

SimplBiz's receipt scanning illustrates the pattern of AI as a data extraction layer. The app captures a photo, compresses it to reduce API costs, sends it to Gemini's vision model with a structured extraction prompt, receives JSON with the parsed fields, and pre-fills the expense form. The user reviews and corrects any mistakes before saving. This "AI suggests, human confirms" pattern is critical for data accuracy — it is much faster than manual entry while still giving the user final control over what gets recorded. The correction data could theoretically be used to improve prompts over time, though SimplBiz currently does not implement this feedback loop.

Both apps share a common error handling pattern: try the AI call, validate the response against the expected schema, retry once if validation fails, and fall back to a manual workflow if the retry also fails. Both apps also use environment-based model selection, so we can switch between Gemini Flash and Pro by changing a configuration variable. And both apps track AI usage metrics (calls per user, average tokens, error rates) through Firebase Analytics custom events, which lets us monitor costs and identify issues before users report them. These patterns are not specific to fitness or finance — they apply to any web app that integrates AI as a core feature.

How to Build an AI-Powered Web App: A Practical Architecture Guide

Related Articles

Prompt Engineering for Web Applications

Privacy-First AI: Building Apps That Process Data Locally

Explore More