Case Studies 5 min read

EdImport: Automating Slovenian Tax Filing for Foreign Investors

By RatataLabs Team |

Every February, thousands of Slovenian retail investors face the same ritual: opening dozens of PDF pages from foreign brokers, hunting for dividend and interest entries, copying amounts into spreadsheets, converting currencies by hand, and then reformatting everything into XML that the eDavki tax portal will accept. A single Trade Republic annual report can run to forty pages of dense transaction history, and if you hold positions across multiple brokers the workload multiplies fast.

The official FURS guidance assumes you will fill in the Doh-Div (dividend) and Doh-Obr (interest) forms line by line inside the eDavki web interface. For someone with ten dividend-paying stocks across two brokers, that means roughly sixty fields of manual entry per form — company name, ISIN, payment date, gross amount, withholding tax, country of origin — with no tolerance for formatting mistakes. A single misplaced decimal or wrong date format causes a validation error on import, and the portal gives you almost no context about what went wrong.

Before EdImport existed, the most common workaround was a community-maintained spreadsheet template that Slovenian investors shared on Reddit and finance forums. You would paste raw numbers into it and it would spit out XML. It helped, but you still had to locate and transcribe every figure by hand. EdImport removes that entire manual step.

The core technical challenge of EdImport is extracting structured financial data from PDFs that were designed for human reading, not machine consumption. PDF is fundamentally a page-layout format, not a data format — text is positioned by absolute coordinates, tables are drawn as a collection of independent text fragments, and the same logical column can shift by a few pixels from one page to the next. There is no concept of a "row" or "cell" unless you reconstruct it yourself.

EdImport uses Mozilla's PDF.js library to extract text content from each page. PDF.js returns an array of text items, each with x/y coordinates, font size, and the string value. The first challenge is reassembling these fragments into logical lines. Two text items on the same y-coordinate (within a tolerance of about two pixels) belong to the same line, but rounding differences between PDF generators mean the tolerance has to be tuned per broker format. Trade Republic PDFs use slightly different vertical spacing than Interactive Brokers statements.

A subtler problem is that PDF.js sometimes splits a single word into multiple text items, especially when the PDF was generated from a web view. A company name like "Apple Inc." might arrive as three separate items: "App", "le", " Inc." — each with its own coordinates. EdImport uses a horizontal-gap heuristic to merge fragments that are within a few pixels of each other, but overly aggressive merging can fuse adjacent columns. Getting this right took more iteration than any other part of the codebase.

Once EdImport has clean text lines, it applies broker-specific pattern matching to identify dividend and interest entries. Each broker has a recognizable document structure. Trade Republic, for example, precedes each dividend payment with a header line containing the ISIN code and a label like "Dividende" or "Ausschüttung", followed by lines with the gross amount, withholding tax, and net credit. EdImport walks through lines sequentially, maintaining state about which transaction it is currently parsing.

ISIN extraction uses a straightforward regex — twelve characters starting with a two-letter country code — but validating that an ISIN actually corresponds to a real security requires a lookup dictionary. EdImport ships a built-in dictionary of several hundred common ISINs mapped to company names, which also normalizes the inconsistent naming you see across brokers. Trade Republic might call it "APPLE INC. DL-.00001" while IBKR just says "AAPL". The dictionary maps both to a canonical "Apple Inc." for the final XML.

Currency conversion is another area where broker formats diverge. Trade Republic reports everything in euros with no conversion needed. Revolut exports sometimes list the original currency alongside the EUR equivalent. IBKR statements can include dividends in USD, GBP, and other currencies, requiring EdImport to parse and apply the exchange rates that IBKR provides in the statement itself. The fallback is the ECB reference rate for the payment date, but using the broker-reported rate is more defensible if FURS ever audits the filing.

The output of EdImport is one or two XML files conforming to the official FURS XSD schemas. The Doh-Div schema requires a specific namespace, envelope elements with the taxpayer's tax number and filing period, and individual Dividend elements containing the ISIN, payer name, country of residence, payment date, gross amount, and any foreign withholding tax already paid. Amounts must be in euros, rounded to two decimal places, and dates must use the YYYY-MM-DD format.

The Doh-Obr schema for interest income is structurally similar but includes the important EUR 1,000 annual exemption threshold for Slovenian residents. EdImport calculates the total interest across all brokers and, if it falls below the threshold, displays a notice that no filing is required but still generates the XML in case the user wants to file anyway. This threshold check alone saves users from unnecessary filings and the anxiety of wondering whether they need to report small amounts.

One of the trickiest XML compliance issues was character encoding in company names. Some broker PDFs include non-ASCII characters — German umlauts in company names, accented characters in French securities — and the eDavki portal is strict about UTF-8 encoding in the XML prologue. Early versions of EdImport used the browser's default encoding for the Blob download, which occasionally produced files that eDavki rejected. Explicitly setting the BOM and encoding in the Blob constructor fixed it.

Each broker integration in EdImport is essentially its own parser module. Trade Republic was the first, built from the annual tax reports they email to customers. Revolut was next — Revolut provides CSV exports from their app, which are structurally simpler to parse than PDFs but come with their own quirks like inconsistent date formats across locales and occasional missing withholding tax columns. Interactive Brokers was the most complex because IBKR statements are configurable by the user, so EdImport has to handle multiple possible layouts.

The parser architecture uses a strategy pattern: each broker module exports a detect function (which examines the first few lines of the file to determine if it matches) and a parse function (which returns a normalized array of dividend and interest records). When a user uploads files, EdImport runs each detector and routes the file to the correct parser. This makes adding new brokers straightforward — you write a detector and a parser, and the rest of the pipeline handles XML generation and display.

Tip: EdImport currently supports Trade Republic (PDF), Revolut (CSV), and Interactive Brokers (CSV). New broker support is driven by community requests — if your broker is missing, open a GitHub issue with a redacted sample statement.

EdImport processes everything locally in the browser. No file ever leaves the user's device. This was not just a privacy-friendly design choice — it was the only responsible approach. Brokerage statements contain full legal names, tax identification numbers, account numbers, and complete transaction histories. Uploading that data to a server, even temporarily, would create a liability that a small open-source project has no business taking on.

Browser-based processing also eliminates server costs entirely. EdImport is hosted as a static site on Firebase Hosting. There is no backend, no database, no cloud function. This means the tool can serve unlimited users at near-zero cost, which matters for a free community tool. It also means there are zero authentication requirements — users never need to create an account or sign in. They open the page, upload their files, and download the result.

EdImport started as a personal tool to solve my own tax filing problem. I shared it on a Slovenian personal finance forum and the response was immediate — within a week, users were requesting Revolut support and reporting edge cases in Trade Republic parsing where certain corporate actions were being misidentified as dividends. Each bug report came with a redacted PDF sample, which became the foundation of a regression test suite.

The impact in time saved is substantial. A user with holdings across two brokers and twenty dividend-paying positions typically spends sixty to ninety minutes on manual entry, assuming no mistakes. With EdImport, the same process takes under five minutes: upload the files, review the extracted table, download the XML, and import into eDavki. For the Slovenian investor community — which has grown rapidly as platforms like Trade Republic and Revolut made foreign investing accessible — that translates to thousands of collective hours saved each tax season.

Looking forward, the most requested feature is capital gains reporting (Doh-KDVP), which requires tracking purchase lots, sale prices, and holding periods. This is significantly more complex than dividend reporting because it requires correlating buy and sell transactions across time. It is on the roadmap, but it will likely require a different architectural approach — possibly maintaining state across multiple uploads rather than the current stateless single-pass model.

Related Articles

Case Studies

Building HevyDuty AI: AI Workout Builder Case Study

6 min read

Case Studies

How Plokk Was Built: Concept to Microsoft Store in 6 Weeks

6 min read

Case Studies

SimplBiz: AI Receipt Scanning on Google Drive

6 min read

Explore More

Discover more articles about vibecoding, AI development, and modern web apps.

All Articles