Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cross Navigation Enrichment #821

Closed
wants to merge 4 commits into from

Conversation

adatzer
Copy link
Contributor

@adatzer adatzer commented Oct 16, 2023

This pull request adds an enrichment that can parse the extended cross navigation format in _sp querystring parameter and attach the cross_navigation context to an event.

The extended cross navigation format can be described by _sp={domainUserId}.{timestamp}.{sessionId}.{subjectUserId}.{sourceId}.{platform}.{reason}

The changes in the trackers to allow this extended format can be seen in:

The corresponding iglu-central pull requests are:

cc @greg-el @igneel64 @matus-tomlein

@adatzer adatzer force-pushed the feat/cross_navigation branch from dcd5137 to f97307d Compare October 16, 2023 21:58
Copy link
Contributor

@matus-tomlein matus-tomlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Will try to test this with the tracker implementations.

@@ -52,19 +53,12 @@ object PageEnrichments {
* @param qsMap The querystring parameters
* @return Validation boxing a pair of optional strings corresponding to the two fields
*/
def parseCrossDomain(qsMap: QueryStringParameters): Either[FailureDetails.EnrichmentFailure, (Option[String], Option[String])] =
def parseCrossDomain(qsMap: QueryStringParameters): Either[FailureDetails.EnrichmentFailure, Map[String, Option[String]]] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving this to the CrossNavigationEnrichment altogether instead of having it partly here and partly there?

Copy link
Contributor Author

@adatzer adatzer Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some back and forths on this. Happy to move this over completely, it just felt that this function since it receives querystring params, the fact that it deals only with _sp is "coincidence", e.g. maybe in the future it deals with more. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see your point, but in case we have more query parameters to parse, wouldn't we have multiple functions for parsing each one separately? Not sure I see the value of having it in one, but this is just a nitpick, this approach sounds fine as well.

"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/cross_navigation_config/jsonschema/1-0-0",

"data": {
"enabled": false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a discussion about the choice of this default?

On one hand, making the enrichment enabled by default would all of a sudden add a new entity for users who previously weren't receiving it. But this would only be for a small number of users and for a small number of events (cross-domain tracking is not a commonly used feature). They would still get the old information so not a breaking change as far as I can understand.

On the other hand, if we have it off by default, we are complicating things for two reasons:

  1. The enrich code is more complicated. Now you have to do part of the logic outside of the enrichment (in the enrichment object and in enrichment manager) and part inside (in the enrichment class). Not a big deal, but it would be cleaner if the flow was more coherent and the atomic properties and the context entity were handled the same.
  2. This restricts our ability to migrate users towards the new context entity rather than using the atomic properties. Majority of users will have this off and I don't see how we could transition towards the new entity in the long term – could we come up with a plan for doing that? The motivation is that we should have one preferred way to represent the information, not one way under some condition and one way under another condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood but i thought this file serves more as an example and it does not denote default.

Perfectly fine by me to enable this by default. I did have a concern about the sudden entity for users, but as long as users can disable it, this is ok by me.

it would be cleaner if the flow was more coherent and the atomic properties and the context entity were handled the same

The part that is handled inside the class relates only to the differences when the enrichment is enabled.
Do you mean to remove the config completely and hardcode the behaviour in the manager/page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind the part about the flow, we can get better feedback from DP on this. But I do think that this configuration suggests the default – at least when I ran enrich locally, I had to change it to true in order for the new behaviour to work.

@matus-tomlein
Copy link
Contributor

matus-tomlein commented Oct 18, 2023

Edit: Have tested this implementation both with the mobile and JS trackers, it works nicely, great work!

@adatzer adatzer marked this pull request as ready for review October 19, 2023 10:59
Copy link
Contributor

@spenes spenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adatzer! The changes looks good in general. I suggested some structural changes to make code more concise and a bit less error prone. I implemented the changes I had in mind in here to reduce the back and forth. Feel free to challenge any changes in there!

@adatzer
Copy link
Contributor Author

adatzer commented Nov 13, 2023

Hi Enes! Thank you very much for reviewing and for creating the cleanup branch. This was really helpful!
I have cherry-picked your commit as is, and also applied scalafmt.

The additional Make edits commit includes some edits related to:

  1. The cross-navigation schema (pending final review). Since the timestamp and the domain_user_id props are required, we decided that if they are missing, the enrichment should not add the corresponding entity at all, i.e. this is considered invalid extended format.
  2. As discussed with Analytics Engineers it is a bit preferrable for optional properties to be null (which is valid according to schema), rather than missing. This is why the filterNot was removed.
  3. Corresponding test changes related to the above.

Additionally, this PR has 2 iglu-central pull requests as dependencies:

Please let us know what you think of these last changes and whether there is anything else to be done from our side. Thanks once again!

cc @igneel64 @greg-el @matus-tomlein

Copy link
Contributor

@spenes spenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ada, the new changes look good to me!

@spenes
Copy link
Contributor

spenes commented Jan 24, 2024

Moved to #856, closing this one.

@spenes spenes closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants