Privacy-First AI: Building Apps That Process Data Locally

Every app at RatataLabs handles sensitive user data. HevyDuty knows your body weight and training capacity. SimplBiz sees your receipts and business expenses. EdImport processes brokerage statements containing tax identification numbers and investment portfolios. If any of this data leaked from a server breach, the consequences for users would range from embarrassing to financially damaging.

The default architecture for most web apps is to upload everything to a backend, process it on a server, and store it in a database you control. This is convenient for the developer but creates a honeypot of sensitive data that you must now secure, back up, and comply with regulations about. For a small team, this is a liability we chose to avoid entirely by processing data where it already lives: on the user's device.

The privacy-first approach is not just ethical. It is a competitive advantage. When users compare SimplBiz to expense trackers that require uploading receipts to unknown servers, the fact that their data never leaves their device and their Google Drive is a genuine differentiator. Trust is hard to earn and easy to lose, and architectural decisions are more credible than privacy policy promises.

Modern browsers are far more capable than most developers realize. EdImport processes PDF brokerage statements entirely in the browser using PDF.js, Mozilla's JavaScript PDF rendering library. The PDFs never leave the user's machine. PDF.js extracts text content with positional metadata, which EdImport then parses with pattern-matching logic to identify ISINs, dividend amounts, payment dates, and withholding taxes.

The processing pipeline runs synchronously in the main thread for small files and offloads to a Web Worker for larger documents to avoid blocking the UI. The entire operation, from dropping a PDF onto the page to seeing a parsed table of dividend records, typically completes in under two seconds. Users can verify the extracted data visually before generating the FURS-compliant XML, which is also constructed entirely in the browser and downloaded as a local file.

This architecture has a profound simplification effect on the backend. EdImport has no backend. It is a static site hosted on Firebase Hosting with no server-side code, no database, and no API. The attack surface for data breaches is effectively zero because there is no server to breach.

Tip: EdImport has zero backend infrastructure. All PDF parsing, data extraction, and XML generation happens in the browser. There is no server to breach.

HevyDuty and SimplBiz use Google's Gemini API for AI features, which means API calls must be authenticated with an API key. The traditional approach is to proxy these calls through your backend, using your own API key, and absorb the cost. This is simple but creates a server that sees every request and response, defeating the local-processing model.

Instead, we use a BYOK (Bring Your Own Key) approach. Users enter their own Gemini API key, which is stored in the browser's localStorage and sent directly from the client to Google's API. Our servers never see the key or the data being processed. For users who do not want to manage their own key, HevyDuty offers a free tier that proxies through a Firebase Cloud Function with a rate-limited shared key, but the data flow is still user-to-Google rather than user-to-us-to-Google.

BYOK has tradeoffs. Onboarding is harder because users must create a Google AI Studio account and generate an API key. We mitigate this with a step-by-step guide embedded in the app, complete with screenshots. Some users will never complete this flow, which is why offering a limited free tier alongside BYOK is important for adoption. But the users who do bring their own key tend to be more engaged and more trusting of the platform.

PDF.js deserves a deeper look because client-side PDF parsing is central to EdImport's architecture. The library loads a PDF into an ArrayBuffer, parses its internal structure, and exposes a page-by-page API for extracting text items with their coordinates, font sizes, and styles. This positional information is critical because brokerage statements use table layouts where the meaning of a number depends on which column it occupies.

EdImport's parsing strategy differs per broker. Trade Republic statements have a consistent format where dividend entries follow a predictable pattern: a date line, followed by an ISIN, followed by gross and net amounts in specific positions. Interactive Brokers uses a different layout with CSV-like sections embedded in the PDF. Revolut provides actual CSV files, which SimplBiz parses with PapaParse rather than PDF.js. Each parser is a separate module that implements a common interface, making it straightforward to add new broker support.

The main challenge with client-side PDF parsing is error handling. PDFs are not a standardized data format in practice. Broker statement layouts change without notice, scanned PDFs have no extractable text at all, and some PDFs use custom encodings that PDF.js cannot decode. EdImport validates extracted data against expected patterns and shows clear warnings when something looks wrong rather than silently producing incorrect tax documents.

SimplBiz needed persistent storage for expense records and receipt images, but we did not want to run a database. The solution was using Google Drive as the storage backend. When a user connects their Google account, SimplBiz creates a dedicated folder structure on their Drive: a root folder, monthly subfolders, and individual receipt images stored as files. Expense metadata is stored in a JSON file that SimplBiz reads and writes on each session.

The Google Drive API's appDataFolder scope is useful here because it creates a hidden folder that only your app can access, preventing users from accidentally deleting or corrupting their data through the Drive UI. However, we chose to use a visible folder instead so users can browse their receipts independently and share them with accountants directly from Drive. This transparency reinforces the privacy narrative: users can see exactly what data exists and where it lives.

Performance is acceptable for the typical use case of a freelancer logging a few expenses per day. The main latency bottleneck is the initial load, where SimplBiz reads the metadata JSON to hydrate the expense list. For users with thousands of entries, we implemented pagination and lazy loading of receipt thumbnails to keep the initial load under three seconds.

Local-first architecture is not free. You give up server-side analytics, cross-device sync (unless you build it yourself via Drive or similar), and the ability to run background jobs. EdImport cannot notify a user when tax deadlines approach because there is no server to schedule notifications from. SimplBiz relies on Google Drive for cross-device access, which works but adds latency compared to a dedicated database.

Debugging is also harder. When a user reports that PDF parsing failed, you cannot inspect their document on a server log. You need to ask them to share the file, which they may not want to do given that it contains financial data. We built in optional error reporting that sends anonymized metadata (page count, text extraction success rate, detected broker format) without any actual document content, but even this requires careful communication to maintain trust.

No server-side analytics or background processing
Cross-device sync depends on third-party storage APIs
Debugging requires user cooperation since data stays local
BYOK adds onboarding friction for non-technical users
Client-side processing is constrained by device performance

Privacy-first architecture only matters if users know about it and believe it. We communicate the local-processing model prominently in each app's onboarding and landing page. EdImport's hero section states "100% browser-based — your documents never leave your device" because this is the primary reason someone would choose it over manually entering data into eDavki.

Open-source code is the strongest form of transparency. When your parsing logic, API calls, and storage operations are all visible in client-side JavaScript, any technically inclined user can verify your privacy claims by opening DevTools. This is a level of auditability that server-side applications can never provide, and it is one of the underappreciated advantages of building privacy-first web applications.

Privacy-First AI: Building Apps That Process Data Locally

Related Articles

Prompt Engineering for Web Applications

Explore More