You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We already have the user info fetcher which runs as part of our opa deployment and can use a variety of backends to retrieve additional information about users that can then be used during authorization.
The main use case for this is probably to retrieve group membership for a user from keycloak, Active Directory, ... and allow writing ACLs that target groups instead of users.
Idea
Authorization happens mainly along the question: who is allowed to do what ?
Its the job of the userinfofetcher to augment the who with additional information, so I guess it is only logical to follow the same pattern and have a resourceinfofetcher augment the what with additional information.
For who the logical source systems are identity providers (Keycloak, AD, Ldap, ....), but for resources these are not the correct places to obtain this information (well, possible in theory, but not usually done I daresay).
For what the most probable source systems will be metadata management solutions like
The idea behind this ticket is to create a ResourceInfoFetcher that can connect to various backends like the ones mentioned above and retrieve information about the resource.
This could allow ACLs along the lines of
user x is allowed to read data product "customers" (which consists of a kafka topic, a nifi flow and three trino tables)
group y cannot read pii data (as indicated by the column being tagged in the metadata system) and will only receive anonymized values
...
The exact implementation for every backend would probably differ quite a bit depending on how the system "thinks" about this, but that is totally okay, as the users would need to carve their rego rules to match the data that is being returned anyway.
D-Quantum Spike
We are currently collaborating with Synabi on a D-Quantum spike of this idea.
Up until now, this is moistly boilerplate code, to add the extra webserver, crd changes etc in the opa operator, but this could now be extended to actually talk to a D-Quantum instance.
The data model of D-Quantum is very flexible and doesn't have fixed names for types of entities, rather every entity (table, data product, column, ...) is just identified by an 'entityTypeId' which the user can freely configure for their instance.
Relationships between these are then modelled in D-Quantum as well, so one valid example might be:
Business Unit (entitytypeid: 134)
Data Product (entitytypeid: 135)
Table (entitytypeid: 136)
Column (entitytypeid: 137)
And the user then has to be able to retrieve configurable excerpts from this structure with the resourceinfofetcher.
This will result in multiple rest calls depending on config, most of whom will depend on each other and cannot be parallelized, so caching becomes fairly important. In the spike code caching is already enabled, but we'll most probably want to look at that some more..
Idea
The current idea is to allow users to model the hierarchy they'd like retrieved in the backend config along these lines:
#[derive(Clone,Debug,Deserialize,Eq,JsonSchema,PartialEq,Serialize)]#[serde(rename_all = "camelCase")]pubstructDQuantumBackend{puburl:String,#[serde(flatten)]pubtls:TlsClientDetails,/// Name of a Secret that contains client credentials of a Keycloak account with permission to read user metadata.////// Must contain the fields `clientId` and `clientSecret`.pubclient_credentials_secret:String,pubhierarchy:DQuantumHierarchy,}#[derive(Clone,Debug,Deserialize,Eq,JsonSchema,PartialEq,Serialize)]#[serde(rename_all = "camelCase")]pubstructDQuantumHierarchy{start_element:u8,id_field:String,#[serde(default)]child:Option<Vec<DQuantumRelation>>,#[serde(default)]parent:Option<Vec<DQuantumRelation>>,}#[derive(Clone,Debug,Deserialize,Eq,JsonSchema,PartialEq,Serialize)]#[serde(rename_all = "camelCase")]pubstructDQuantumRelation{element_id:u8,relation_id:String,}
With the startElement identifying the entityTypeId that corresponds to the resourcename that opa would get from the product (think "table" for Trino).
Based on this startElement the resourceinfofetcher would then walk up and down the tree as defined in the hierarchy and return a representation of this tree to OPA.
TODO: we probably need extra information in the hierarchy, how should things be called in the returned datastructure and which direction the relationship is modelled in upstream.
find columns for table: https://demo01.synabi.com/dquantum/api/entity/search/148/zugehoeriges_datenobjekt?propertyValue=<uid for table from response above>
Background
We already have the user info fetcher which runs as part of our opa deployment and can use a variety of backends to retrieve additional information about users that can then be used during authorization.
The main use case for this is probably to retrieve group membership for a user from keycloak, Active Directory, ... and allow writing ACLs that target groups instead of users.
Idea
Authorization happens mainly along the question: who is allowed to do what ?
Its the job of the userinfofetcher to augment the who with additional information, so I guess it is only logical to follow the same pattern and have a resourceinfofetcher augment the what with additional information.
For who the logical source systems are identity providers (Keycloak, AD, Ldap, ....), but for resources these are not the correct places to obtain this information (well, possible in theory, but not usually done I daresay).
For what the most probable source systems will be metadata management solutions like
The idea behind this ticket is to create a ResourceInfoFetcher that can connect to various backends like the ones mentioned above and retrieve information about the resource.
This could allow ACLs along the lines of
The exact implementation for every backend would probably differ quite a bit depending on how the system "thinks" about this, but that is totally okay, as the users would need to carve their rego rules to match the data that is being returned anyway.
D-Quantum Spike
We are currently collaborating with Synabi on a D-Quantum spike of this idea.
I have created a branch with code for this spike: https://github.com/stackabletech/opa-operator/tree/spike/resource_info_fetcher
Up until now, this is moistly boilerplate code, to add the extra webserver, crd changes etc in the opa operator, but this could now be extended to actually talk to a D-Quantum instance.
The data model of D-Quantum is very flexible and doesn't have fixed names for types of entities, rather every entity (table, data product, column, ...) is just identified by an 'entityTypeId' which the user can freely configure for their instance.
Relationships between these are then modelled in D-Quantum as well, so one valid example might be:
And the user then has to be able to retrieve configurable excerpts from this structure with the resourceinfofetcher.
This will result in multiple rest calls depending on config, most of whom will depend on each other and cannot be parallelized, so caching becomes fairly important. In the spike code caching is already enabled, but we'll most probably want to look at that some more..
Idea
The current idea is to allow users to model the hierarchy they'd like retrieved in the backend config along these lines:
With the
startElement
identifying the entityTypeId that corresponds to the resourcename that opa would get from the product (think "table" for Trino).Based on this startElement the resourceinfofetcher would then walk up and down the tree as defined in the hierarchy and return a representation of this tree to OPA.
TODO: we probably need extra information in the hierarchy, how should things be called in the returned datastructure and which direction the relationship is modelled in upstream.
Example
The following hierarchy:
DatenObjektGruppierung (153) -> DatenObjekt (146) -> Datenfeld (148)
could be represented like this:
And result in the following api calls:
find dataobject for table name from opa:
https://demo03.synabi.com/dquantum/api/entity/search/146/Name?propertyValue=<tablename>
returns:
find columns for table:
https://demo01.synabi.com/dquantum/api/entity/search/148/zugehoeriges_datenobjekt?propertyValue=<uid for table from response above>
find group for table:
use id from field "Zugehörige Datenobjektgruppierung" in the response to the first query.
https://demo03.synabi.com/dquantum/api/entity/3b19e8b92d300a1a302b306e20bba183
The text was updated successfully, but these errors were encountered: