Why building on FHIR is hard

FHIR is a brilliant interoperability spec. It is a strange foundation for one team building one product. Here's the gap — and what bonfireDB does about it.

The meta-insight

FHIR was built to move records between distrustful organizations — not for one team building one product.

So when you build an app on it, you inherit a federation protocol. Referential integrity becomes something you enforce. Write semantics assume strangers are racing you. Search isn't a query language — it's a portability contract. You set out to ship an app, and you end up operating a FHIR server.

The pain catalog

Six places FHIR makes you the operator, not the builder

Every theme below is a real, documented failure mode — with the spec, issue tracker, and vendor receipts. And what bonfireDB does instead.

Theme A

Write semantics assume strangers

"I called conditional-create once and somehow I have two Patients with the same MRN. The spec literally says the race 'usually can't be fully eliminated.' My GET-edit-PUT just clobbered a teammate's concurrent edit because PUT is a full replacement. And my 'atomic' transaction Bundle wasn't atomic under load."

The duplicate-Patient race is real and acknowledged upstream (HAPI #3141, Microsoft #1382, IBM #2051) — and the failure surfaces as a 412, not a 500, so it slips past your error handling. bonfireDB: writes go through typed primitives over Postgres with real transactions and optimistic concurrency, so a write is one committed unit — no full-document replacement, no silent twin Patients.

Theme B

FHIR is a data model, not an access-control system

"I set a security label to mask a record, then it leaked back out through GraphQL, $export, and history. Consent is just a schema — there's no engine that enforces it. HealthLake has no per-resource ABAC; IAM gates the whole datastore or nothing. And Azure won't even auto-write AuditEvents for me."

A schema for security labels and Consent is not an enforcement engine — masking famously leaks via GraphQL, $export, and _history, and managed FHIR services punt authz to the perimeter. bonfireDB: ABAC is enforced per-read and per-write at the primitive layer, every access is audited, and tenant/patient scoping is the default — not a label you hope downstream respects.

Theme C

Realtime guarantees nothing

"rest-hook is best-effort, so I silently lost events. The ones that arrive come out of commit order. Bulk $import emits zero events, so my search index never learns about imported data. And once, the subscription channel fell days behind — my analytics and search index rotted and nobody noticed."

FHIR subscriptions are best-effort with no ordering and no coverage for bulk $import — your search index and analytics drift silently until something breaks in front of a clinician. bonfireDB: committed operational read models stay fresh on commit, and every write returns a freshness lifecycle so you know exactly which views and indexes are caught up — instead of guessing.

Theme D

Terminology conscripts you into running a terminology server

"To validate a code I need $expand and $validate-code. To run those I need to stand up a terminology server, load multi-gigabyte SNOMED and LOINC, and chase weekly RxNorm drift. I'm a two-person app team now maintaining a code-system pipeline."

Real-world FHIR validation drags you into operating a terminology service — though it's worth noting SNOMED CT is free for US use via the NLM UMLS license; the licensing cost only bites international and non-member use. bonfireDB: clinical primitives carry typed, coded fields so the common cases validate without you running a terminology server for everyday writes.

Theme E

App-state homelessness

"FHIR has nowhere to put a draft, UI state, or a workflow flag — so I stood up a second database. Now I'm in sync hell between the two. And I can't query by arbitrary fields, joins, or aggregates, so even a simple list screen needs me to hand-build a denormalized projection."

FHIR has no home for drafts, UI state, or workflow, and no general query surface — so teams bolt on a second DB and a manual projection per screen. bonfireDB: Postgres is the source of truth, so app state lives next to clinical data with real queries, joins, and aggregates — and the operational read models give you list-screen projections without a second store to sync.

Theme F

Agents over raw FHIR cap around 50%

"I pointed an agent at the FHIR record and it tops out around 50% answer correctness. One patient record is roughly 3M tokens of FHIR JSON. It skips reference-chasing and filters on display text instead of codes — and writes are even worse than reads."

On FHIR-AgentBench the best agent reaches ~50% answer correctness over raw FHIR; a single record can be ~3M tokens of JSON; on MedAgentBench agents read at 85% but write at only 54%, and routinely filter on display text instead of codes. bonfireDB: agents read clean, cited projections — never raw FHIR by default — through typed tools plus a sandboxed code/SQL tool over those projections, the combination designed to move reliability past the 50% wall.

The journey

What every layer of building costs you on raw FHIR — and what bonfireDB does

From first command to running in production, here's where the operator tax shows up at each layer.

LayerThe pain on raw FHIRWhat bonfireDB does
Get started Stand up a server, pick a profile pack, wire auth — days before line one of product. A single typed SDK gives you a clinical backend on Postgres — no FHIR server to run. (Early access.)
Model Bend your domain into 145+ generic resources and references. App-native clinical primitives; FHIR R4 generated underneath for export.
Write Conditional-create duplicate races, PUT lost-updates, non-atomic Bundles. Typed writes over real Postgres transactions; each returns a freshness lifecycle.
Read for the UI No arbitrary queries; hand-build a denormalized projection per screen. Committed read models (notesByPatient, timeline, latestScores) fresh on commit; reactive client cache.
Search Search is a portability contract, not a query language; semantic search is on you. Async semantic search over pgvector with status reporting — heavy work doesn't block writes.
Agents ~50% answer correctness over ~3M-token raw FHIR records; filters on display text. Agents read clean, cited, permission-aware projections via the custom MCP builder; writes are propose-only.
Authorize & comply Labels and Consent leak; no enforcement engine; managed FHIR has no FHIR-aware authz. ABAC enforced per-read/per-write; every access audited; tenant/patient scoping by default.
Interop You're already running a FHIR server just to talk to one partner someday. clinical.fhir.export(patientId) emits a clean FHIR R4 Bundle on demand.
Operate You're the operator: terminology server, subscription drift, two-DB sync hell. Run the Apache-2.0 core in your own AWS, or the managed tier where we sign a BAA.

Specifics cited descriptively. The ~50% figure is best-agent answer correctness over raw FHIR on FHIR-AgentBench, not a 50% failure rate. "FHIR-compatible / FHIR-native" is used descriptively; "FHIR®" is a registered trademark of HL7.

You build the app. Bonfire is the clinical data layer underneath.

Stop operating a FHIR server by accident. Start with a typed clinical backend that's fresh on commit, agent-ready, and FHIR underneath.

FAQ

Frequently asked questions

Why is building an app on FHIR so hard?

FHIR was designed to move records between distrustful organizations, not for one team building one product. Build an app on it and you inherit a federation protocol: you enforce referential integrity yourself, fight concurrent-write races, and treat search as a portability contract — so you end up operating a FHIR server instead of shipping your app.

FHIR vs an app backend — what's the difference?

FHIR is a data and interoperability model; it has no enforcement engine for access control, no home for drafts or UI state, and no general query language. An app backend like bonfireDB puts Postgres as the source of truth with typed clinical primitives, real transactions, ABAC enforced per-read/write, and FHIR R4 generated underneath for export when you need it.

Why do agents only reach about 50% accuracy on raw FHIR?

On FHIR-AgentBench the best agent tops out around 50% answer correctness over raw FHIR, partly because a single record can be ~3M tokens of JSON and agents skip reference-chasing and filter on display text instead of codes. bonfireDB has agents read clean, cited, permission-aware projections through typed tools plus a sandboxed SQL tool — the combination designed to move past that wall.

Can I use plain Postgres instead of a FHIR server?

For one team building one product, Postgres gives you real queries, joins, aggregates, transactions, and a place to store app state next to clinical data — things raw FHIR lacks. bonfireDB is built on Postgres with pgvector and generates a clean FHIR R4 Bundle on demand via clinical.fhir.export(patientId), so you get app-backend ergonomics without operating a FHIR server for interop you may never need.

Why does FHIR access control leak even with security labels and Consent?

A schema for security labels and Consent is not an enforcement engine — masked records famously leak back through GraphQL, $export, and _history, and managed services like HealthLake gate the whole datastore via IAM with no FHIR-aware authz. bonfireDB is designed to enforce ABAC per-read and per-write at the primitive layer, audit every access, and apply tenant/patient scoping by default.