SQL on FHIR Analytics, No ETL or Spark

The question every health-data builder asks

How do you run SQL analytics on FHIR without an ETL pipeline?

Define a SQL-on-FHIR v2 ViewDefinition that flattens the resources you care about into a flat table, and query that table in plain SQL. bonfireDB materializes the view in Postgres and keeps it fresh on commit — so dashboards, BI, and ML feature pipelines read live data with no Spark, no nightly ETL, and no separate warehouse.

Flat by design

A ViewDefinition turns nested FHIR into named columns — patient_id, phq9_score, recorded_at — that a SQL engine and a BI tool already understand.

Fresh on commit

Views are incrementally maintained inside the write transaction. When a note signs or a score lands, the view reflects it — no batch job to wait on.

Authorized per query

The same policy that gates clinical reads applies to analytics: row-level by patient and tenant, column-level by minimum-necessary role.

FHIR is a graph, not a table — and that's where analytics breaks

A FHIR resource is a deeply nested document with repeating arrays, references to other resources, and choice-of-type fields. The moment you flatten it for analytics, the structure fights back.

✗ Joining repeating arrays (multiple Observation.component, multiple coded values) detonates into Cartesian row explosions — one patient becomes hundreds of rows.
✗ Reaching a single value (valueQuantity.value under the right code.coding.code) means hand-written JSON traversal in every query.
✗ The usual escape hatch is a heavy Spark / Pathling ETL job that copies FHIR into a separate warehouse on a schedule — adding latency, a second system, and a second BAA.
✗ By the time the warehouse is built, the data is already stale, and your dashboard shows last night's numbers.

So teams either write fragile flattening SQL by hand or stand up a parallel analytics stack. Both are wrong jobs for a four-person team shipping a product.

flattening by hand

// the value you want is buried…
{ "resourceType": "Observation",
  "code": { "coding": [
    { "system": "http://loinc.org",
      "code": "44261-6" } ] },
  "valueQuantity": { "value": 14 },
  "subject": { "reference": "Patient/123" } }

// …so the query becomes JSON archaeology
SELECT resource->'valueQuantity'->>'value'
FROM fhir
WHERE resource#>>'{code,coding,0,code}' = '44261-6'
// repeat for every metric, pray the path holds

Define the view, not the pipeline

A ViewDefinition is a portable, declarative flattening spec

SQL-on-FHIR v2 ViewDefinitions are an HL7® standard: you declare which resource to flatten, which rows to keep, and which columns to project — using FHIRPath. bonfireDB compiles that spec into a materialized Postgres view. Write it once; query it like any table.

views/phq9.view.ts

export const phq9View = defineView({
  name: "phq9_scores",
  resource: "Observation",

  // keep only the rows we care about — no Cartesian blowup
  where: [{ path: "code.coding.where(system='http://loinc.org' and code='44261-6').exists()" }],

  // project nested FHIR into flat, typed columns
  select: [{
    column: [
      { name: "patient_id", path: "subject.getReferenceKey('Patient')" },
      { name: "phq9_score", path: "valueQuantity.value", type: "decimal" },
      { name: "recorded_at", path: "effectiveDateTime", type: "dateTime" },
    ],
  }],

  // kept fresh on commit via incremental view maintenance
  refresh: "on-commit",
});

The where clause filters before projection, so repeating arrays never multiply your rows. The spec is the standard's portable JSON — the same ViewDefinition runs on any conformant SQL-on-FHIR engine.

Then it's just a table

Once the view is materialized, phq9_scores is an ordinary Postgres table. Aggregate it, window over it, join it to your other views — with the SQL you already know and the BI tools you already use.

✓ Point Metabase, Superset, Tableau, or a notebook straight at the view — no connector gymnastics.
✓ Build ML feature pipelines from the same flat tables you report on — one source of truth, no drift.
✓ Window functions, CTEs, joins across views — full SQL, not a constrained FHIR search dialect.

No $export to a bucket, no Spark cluster, no warehouse sync. The analytics surface lives inside the same store as the operational data.

cohort_response.sql

-- mean PHQ-9 change per clinician, last 90 days
SELECT
  c.clinician_id,
  count(DISTINCT p.patient_id)      AS patients,
  avg(p.last_score - p.first_score) AS mean_change
FROM (
  SELECT patient_id,
    first_value(phq9_score) OVER w AS first_score,
    last_value(phq9_score)  OVER w AS last_score
  FROM phq9_scores
  WHERE recorded_at > now() - '90 days'::interval
  WINDOW w AS (PARTITION BY patient_id ORDER BY recorded_at)
) p
JOIN assignments_view c USING (patient_id)
GROUP BY c.clinician_id;

From resource to dashboard

The whole path, with no pipeline in the middle

There is no copy step, no scheduler, no second datastore. The view is part of the database that already holds your FHIR.

Define

Write a SQL-on-FHIR v2 ViewDefinition declaring the resource, the filter, and the flat columns you want.

Materialize

bonfire compiles it into a Postgres materialized view, indexed and typed — no Spark, no external job.

Maintain

Incremental view maintenance updates the view inside the write transaction, so it's fresh on commit.

Query

Run SQL — or point a BI tool or notebook at it. Every query is filtered by the access policy automatically.

Analytics that respects who's allowed to see what

A warehouse export is where authorization usually dies — once FHIR lands in a flat table, the patient scoping and minimum-necessary rules are gone. In bonfire, the policy follows the data into the view: row-level by patient and tenant, column-level by role.

✓ Row-level: a clinician's query over phq9_scores only returns their assigned patients, in their tenant.
✓ Column-level: a front_desk role sees appointment columns but never clinical scores in the same view.
✓ Same gate as reads, writes, and agent calls — no analytics side door that bypasses consent.

Pairs with clinical authorization & audit — the policy is defined once and enforced on the analytics surface too. Every analytics query is auditable like any other read.

view.policy.ts

clinical.views.policy("phq9_scores", {
  // row-level: scope every query to the caller's patients + tenant
  rowFilter: (ctx) => ({
    tenant_id: ctx.orgId,
    patient_id: clinical.assignments.visibleTo(ctx.userId),
  }),

  // column-level: which columns each role may project
  columns: {
    front_desk: ["patient_id", "recorded_at"],
    clinician:   ["patient_id", "recorded_at", "phq9_score"],
  },
});

// the same SELECT returns different rows + columns per caller
await clinical.sql(`SELECT * FROM phq9_scores`, ctx);

Side by side

The unoccupied trifecta: fresh + authorized + no Spark, in one store

SQL-on-FHIR isn't new — Aidbox and Pathling each ship pieces of it. What no one ships is all three at once: views that stay fresh on commit, are authorized row- and column-level per query, and need no separate Spark or warehouse. That combination is the moat.

Capability	HealthLake + Athena	Pathling (Spark)	Aidbox	bonfireDB
SQL-on-FHIR v2 ViewDefinitions	No (raw $export)	Yes	Yes	Yes
No separate Spark / warehouse	Glue + Athena	Needs Spark cluster	In-DB	In-Postgres, no Spark
Fresh on commit (incremental)	Batch $export	Batch / re-extract	Refresh-driven	Maintained in the write txn
Row-level access on the view	IAM, not patient-scoped	Your job	Configurable	Per-patient / per-tenant
Column-level (minimum-necessary)	Manual	Manual	Configurable	Role-scoped, built in
Pre-seed / indie footprint	Heavy AWS stack	JVM + Spark ops	Server to run	one SDK, Postgres-first

bonfireDB is early-stage; this page describes product design and positioning. Comparisons reflect each system's stated SQL-on-FHIR and access-control model, not a benchmark. SQL-on-FHIR and FHIR are HL7® specifications; "FHIR" is used descriptively. See the full comparisons for where dedicated FHIR servers and analytics engines fit.

Where this fits

Analytics is one surface of the same store

Always fresh

Incremental view maintenance is the same engine that keeps operational reads fresh on commit.

Explore →

Authorization & audit

One policy gates reads, writes, agents — and now analytics views.

Explore →

FHIR underneath

Views compile from the FHIR R4 stored beneath your typed clinical functions.

Explore →

Comparisons

How bonfire's SQL-on-FHIR sits against HealthLake, Pathling, and Aidbox.

Compare →

FAQ

Frequently asked questions

How do you run SQL analytics on FHIR without an ETL pipeline?

Define a SQL-on-FHIR v2 ViewDefinition that flattens the FHIR resources you care about into named columns, and query that as a plain Postgres table. bonfireDB is designed to materialize the view in-database and keep it fresh on commit, so there's no Spark, no nightly ETL, and no separate warehouse to copy data into.

What is a SQL-on-FHIR ViewDefinition?

A ViewDefinition is an HL7 SQL-on-FHIR v2 standard: a declarative JSON spec that says which resource to flatten, which rows to keep, and which columns to project using FHIRPath. It's portable across conformant engines. bonfireDB compiles it into a materialized Postgres view you query like any table.

FHIR vs Postgres for analytics — which should I query?

FHIR is a nested graph, so querying it directly means fragile JSON traversal and Cartesian row explosions from repeating arrays. bonfireDB's approach keeps Postgres as the source of truth, with FHIR R4 generated underneath, and projects flat tables via ViewDefinitions via ViewDefinitions so you analyze with ordinary SQL, joins, and window functions.

Do I need Spark or Pathling to do analytics on clinical data?

No. The common pattern is a Spark/Pathling ETL job that copies FHIR into a separate warehouse on a schedule, adding latency, a second system, and a second BAA. bonfireDB is designed to materialize views inside the same Postgres store, incrementally maintained in the write transaction — no Spark cluster and no batch export.

Is access control enforced on the analytics views?

Yes, by design. A warehouse export usually loses patient scoping, but bonfireDB applies the same policy to views as to clinical reads: row-level by patient and tenant, column-level by minimum-necessary role. The same SELECT returns different rows and columns per caller, and every query is auditable.

📊 SQL-on-FHIR analytics

How do you run SQL analytics on FHIR without an ETL pipeline?

Flat by design

Fresh on commit

Authorized per query

FHIR is a graph, not a table — and that's where analytics breaks

A ViewDefinition is a portable, declarative flattening spec

Then it's just a table

The whole path, with no pipeline in the middle

Define

Materialize

Maintain

Query

Analytics that respects who's allowed to see what

The unoccupied trifecta: fresh + authorized + no Spark, in one store

Analytics is one surface of the same store

Always fresh

Authorization & audit

FHIR underneath

Comparisons

You build the app. Bonfire is the clinical data layer underneath.

Frequently asked questions