Running FHIR Analytics (Datapipes) without Apache Spark

amolo · July 14, 2025, 9:34am

What is the recommended Open Health Stack approach for running FHIR Analytics Datapipes effectively without an Apache Spark dependency with the intention of using Postgres as the analytics database?

@bashir @dubdabasoduba

bashir · July 15, 2025, 6:52pm

The Analytics component of OHS can roughly be split into two parts:

Transformation pipelines
Querying transformed data

For running the pipelines, you have several options including running a binary on a single machine (like any other application).

One common output format of the transformation is Parquet. For querying Parquet files you can use any tool that understands Parquet (so it does not have to be Spark). If you want to use relational databases (e.g. PostgreSQL), you can use the flat views output which has native support in the pipelines (by enabling the sinkDbConfigPath config param).

Topic		Replies	Views
OHS Community Call #1 \| Sep '25 General community , community-call	3	95	September 30, 2025
Analytics on FHIR QwikLab FHIR Analytics fhir-analytics , developer-resource	0	70	December 1, 2024
About the FHIR Analytics category FHIR Analytics	0	29	September 4, 2024
Your Ideas for the Next Focus of the OHS Project General announcement , community	1	48	August 5, 2025
FHIR Mapping workshop @ ODHS Events and Workshops fhir-mapping , data-transformation	0	65	December 1, 2024

Running FHIR Analytics (Datapipes) without Apache Spark

Related topics