Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike resource info fetcher to augment authorization with data from metadata management tools like DataHub etc. #651

Open
soenkeliebau opened this issue Nov 14, 2024 · 1 comment

Comments

@soenkeliebau
Copy link
Member

soenkeliebau commented Nov 14, 2024

Background

We already have the user info fetcher which runs as part of our opa deployment and can use a variety of backends to retrieve additional information about users that can then be used during authorization.

The main use case for this is probably to retrieve group membership for a user from keycloak, Active Directory, ... and allow writing ACLs that target groups instead of users.

Idea

Authorization happens mainly along the question: who is allowed to do what ?

Its the job of the userinfofetcher to augment the who with additional information, so I guess it is only logical to follow the same pattern and have a resourceinfofetcher augment the what with additional information.

For who the logical source systems are identity providers (Keycloak, AD, Ldap, ....), but for resources these are not the correct places to obtain this information (well, possible in theory, but not usually done I daresay).

For what the most probable source systems will be metadata management solutions like

The idea behind this ticket is to create a ResourceInfoFetcher that can connect to various backends like the ones mentioned above and retrieve information about the resource.
This could allow ACLs along the lines of

  • user x is allowed to read data product "customers" (which consists of a kafka topic, a nifi flow and three trino tables)
  • group y cannot read pii data (as indicated by the column being tagged in the metadata system) and will only receive anonymized values
  • ...

The exact implementation for every backend would probably differ quite a bit depending on how the system "thinks" about this, but that is totally okay, as the users would need to carve their rego rules to match the data that is being returned anyway.

D-Quantum Spike

We are currently collaborating with Synabi on a D-Quantum spike of this idea.

I have created a branch with code for this spike: https://github.com/stackabletech/opa-operator/tree/spike/resource_info_fetcher

Up until now, this is moistly boilerplate code, to add the extra webserver, crd changes etc in the opa operator, but this could now be extended to actually talk to a D-Quantum instance.

The data model of D-Quantum is very flexible and doesn't have fixed names for types of entities, rather every entity (table, data product, column, ...) is just identified by an 'entityTypeId' which the user can freely configure for their instance.
Relationships between these are then modelled in D-Quantum as well, so one valid example might be:

  • Business Unit (entitytypeid: 134)
    • Data Product (entitytypeid: 135)
      • Table (entitytypeid: 136)
        • Column (entitytypeid: 137)

And the user then has to be able to retrieve configurable excerpts from this structure with the resourceinfofetcher.
This will result in multiple rest calls depending on config, most of whom will depend on each other and cannot be parallelized, so caching becomes fairly important. In the spike code caching is already enabled, but we'll most probably want to look at that some more..

Idea
The current idea is to allow users to model the hierarchy they'd like retrieved in the backend config along these lines:

#[derive(Clone, Debug, Deserialize, Eq, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct DQuantumBackend {
    pub url: String,

    #[serde(flatten)]
    pub tls: TlsClientDetails,

    /// Name of a Secret that contains client credentials of a Keycloak account with permission to read user metadata.
    ///
    /// Must contain the fields `clientId` and `clientSecret`.
    pub client_credentials_secret: String,

    pub hierarchy: DQuantumHierarchy,
}

#[derive(Clone, Debug, Deserialize, Eq, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct DQuantumHierarchy {
    start_element: u8,
    id_field: String,
    #[serde(default)]
    child: Option<Vec<DQuantumRelation>>,
    #[serde(default)]
    parent: Option<Vec<DQuantumRelation>>,
}

#[derive(Clone, Debug, Deserialize, Eq, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct DQuantumRelation {
    element_id: u8,
    relation_id: String,
}

With the startElement identifying the entityTypeId that corresponds to the resourcename that opa would get from the product (think "table" for Trino).
Based on this startElement the resourceinfofetcher would then walk up and down the tree as defined in the hierarchy and return a representation of this tree to OPA.

TODO: we probably need extra information in the hierarchy, how should things be called in the returned datastructure and which direction the relationship is modelled in upstream.

Example

The following hierarchy:

DatenObjektGruppierung (153) -> DatenObjekt (146) -> Datenfeld (148)

could be represented like this:

hierarchy:
  startElement: 146
  idField: Name
  parent:
    - elementId: 153
      relationId: zugehoerige_datenobjektgruppierung
  child: 
    - elementId: 148
      relationId: zugehoeriges_datenobjekt

And result in the following api calls:

find dataobject for table name from opa:
https://demo03.synabi.com/dquantum/api/entity/search/146/Name?propertyValue=<tablename>

returns:

{
	"total": 1,
	"forbidden": 0,
	"offset": 0,
	"limit": 50,
	"entities": [
		{
			"uid": "d7efb1bb2b4bb201a47c2f5ab65e105d",
			"draft": false,
			"name": "sap.fsdm.tabletypes::InterestRateRiskAdjustmentTT_Erase",
			"entityTypeId": 146,
			"entityTypeName": "Datenobjekt",
			"archived": false,
			"properties": [
				{
					"name": "Application Owner",
					"value": ""
				},
				{
					"name": "Code",
					"value": ""
				},
				{
					"name": "Datenfelder",
					"value": ""
				},
				{
					"name": "Datenfelder (manuell)",
					"value": ""
				},
				{
					"name": "Id",
					"value": "155470"
				},
				{
					"name": "Kommentar D-QUANTUM",
					"value": ""
				},
				{
					"name": "Kommentar Quellsystem",
					"value": ""
				},
				{
					"name": "Metadatenquelle",
					"value": "automatisch"
				},
				{
					"name": "Name",
					"value": "sap.fsdm.tabletypes::InterestRateRiskAdjustmentTT_Erase"
				},
				{
					"name": "Technische Details",
					"value": ""
				},
				{
					"name": "Typ",
					"value": "Tabelle"
				},
				{
					"name": "Zugehörige Datenobjektgruppierung",
					"value": "3b19e8b92d300a1a302b306e20bba183"
				},
				{
					"name": "Zugehöriges IT System",
					"value": ""
				},
				{
					"name": "source",
					"value": ""
				}
			],
			"created": "2023-08-29 19:12:04.800000 Z",
			"createdUser": "-",
			"modified": "2023-08-29 19:40:35.039000 Z",
			"modifiedUser": "-"
		}
	]
}

find columns for table:
https://demo01.synabi.com/dquantum/api/entity/search/148/zugehoeriges_datenobjekt?propertyValue=<uid for table from response above>

{
	"total": 9,
	"forbidden": 0,
	"offset": 0,
	"limit": 50,
	"entities": [
		{
			"uid": "45b01100bb7f2c805be9937cd781f44e",
			"draft": false,
			"name": "AccountingChangeSequenceNumber",
			"entityTypeId": 148,
			"entityTypeName": "Datenfeld",
			"archived": false,
			"properties": [
				{
					"name": "Datenobjekt (manuell)",
					"value": ""
				},
				{
					"name": "Datentyp",
					"value": "INTEGER"
				},
				{
					"name": "Id",
					"value": "73150"
				},
				{
					"name": "Input",
					"value": ""
				},
				{
					"name": "Kommentar D-QUANTUM",
					"value": ""
				},
				{
					"name": "Kommentar Quellsystem",
					"value": ""
				},
				{
					"name": "Metadatenquelle",
					"value": "automatisch"
				},
				{
					"name": "Name",
					"value": "AccountingChangeSequenceNumber"
				},
				{
					"name": "Output",
					"value": ""
				},
				{
					"name": "Technische Details",
					"value": ""
				},
				{
					"name": "Zugehörige Datenobjektgruppierung",
					"value": ""
				},
				{
					"name": "Zugehöriges Datenobjekt",
					"value": "d7efb1bb2b4bb201a47c2f5ab65e105d"
				},
				{
					"name": "Zugehöriges IT System",
					"value": ""
				}
			],
			"created": "2023-08-29 19:12:04.800000 Z",
			"createdUser": "-",
			"modified": "2023-08-29 19:40:35.039000 Z",
			"modifiedUser": "-"
		},
                # ...
		{
			"uid": "d17a32644cb02c9cf9e552ff31301839",
			"draft": false,
			"name": "IndicatorResultBeforeChange",
			"entityTypeId": 148,
			"entityTypeName": "Datenfeld",
			"archived": false,
			"properties": [
				{
					"name": "Datenobjekt (manuell)",
					"value": ""
				},
				{
					"name": "Datentyp",
					"value": "BOOLEAN"
				},
				{
					"name": "Id",
					"value": "152002"
				},
				{
					"name": "Input",
					"value": ""
				},
				{
					"name": "Kommentar D-QUANTUM",
					"value": ""
				},
				{
					"name": "Kommentar Quellsystem",
					"value": ""
				},
				{
					"name": "Metadatenquelle",
					"value": "automatisch"
				},
				{
					"name": "Name",
					"value": "IndicatorResultBeforeChange"
				},
				{
					"name": "Output",
					"value": ""
				},
				{
					"name": "Technische Details",
					"value": ""
				},
				{
					"name": "Zugehörige Datenobjektgruppierung",
					"value": ""
				},
				{
					"name": "Zugehöriges Datenobjekt",
					"value": "d7efb1bb2b4bb201a47c2f5ab65e105d"
				},
				{
					"name": "Zugehöriges IT System",
					"value": ""
				}
			],
			"created": "2023-08-29 19:12:04.800000 Z",
			"createdUser": "-",
			"modified": "2023-08-29 19:40:35.039000 Z",
			"modifiedUser": "-"
		}
	]
}

find group for table:
use id from field "Zugehörige Datenobjektgruppierung" in the response to the first query.

https://demo03.synabi.com/dquantum/api/entity/3b19e8b92d300a1a302b306e20bba183

@soenkeliebau
Copy link
Member Author

I have played around a bit and created an example of how this might look in code:

pub hierarchy: TableEntity,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant