Skip to content

Multitenancy Design Proposal

James Agnew edited this page Feb 4, 2020 · 6 revisions

Goal

This page attempts to document the initial design for a scalable multitenancy strategy. This strategy will have several goals:

  • It can be used to provide logical secure segregation of data (i.e. user performs a search for "find all patients in tenant 123 with name = 'smith'", they should receive only results that actually belong to this tenant
  • It can be used to create logical partitions of data for partition-based archiving. Note in this scenario, some prep work may be required to resolve resource references into the archived partition before the partition is removed.

The following tables are all related to an individual resource instance in the database:

  • HFJ_RESOURCE
  • HFJ_RES_VER
  • HFJ_RES_TAG
  • HFJ_FORCED_ID
  • HFJ_IDX_CMP_STRING_UNIQ
  • HFJ_SPIDX_COORDS
  • HFJ_SPIDX_DATE
  • HFJ_SPIDX_NUMBER
  • HFJ_SPIDX_QUANTITY
  • HFJ_SPIDX_STRING
  • HFJ_SPIDX_TOKEN
  • HFJ_SPIDX_URI

Proposed Design

Each of these tables would add a new Integer discriminator column called "tenant". hapi-fhir clients can populate this column by setting resource.setUserData("TENANT", value) in a PRESTORAGE interceptor. When persisting all the records associated with that resource, hapi-fhir will set the TENANT to this value from the populated userData. If no TENANT is provided, hapi-fhir will default the value to 0.

The key feature of this design is that:

  • Tenant selection will be done when a resource is created. User code will have ultimate discretion about which tenent a resource belongs to, so it might be decided based on a URL prefix, a header, a hidden attribute of the logged in user, etc.
  • It will be possible to perform searches that are strictly restricted to one tenent, but it will also be possible to perform searches that cross tenant boundaries (e.g. the logged in user can access tenant A+B+C but not D, or the logged in user can access all tenants)
  • Because all resource-relevant tables will have a consistent tenant identifier via the pair of new columns, it will be possible to perform sharding and partitioning strategies at the database level using this identifier as a key