Running FHIR Analytics (Datapipes) without Apache Spark

What is the recommended Open Health Stack approach for running FHIR Analytics Datapipes effectively without an Apache Spark dependency with the intention of using Postgres as the analytics database?

@bashir @dubdabasoduba

The Analytics component of OHS can roughly be split into two parts:

  1. Transformation pipelines
  2. Querying transformed data

For running the pipelines, you have several options including running a binary on a single machine (like any other application).

One common output format of the transformation is Parquet. For querying Parquet files you can use any tool that understands Parquet (so it does not have to be Spark). If you want to use relational databases (e.g. PostgreSQL), you can use the flat views output which has native support in the pipelines (by enabling the sinkDbConfigPath config param).