Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrubber refactoring #3108

Merged
merged 16 commits into from
Nov 16, 2023
Merged

Scrubber refactoring #3108

merged 16 commits into from
Nov 16, 2023

Conversation

VeronikaSolovei9
Copy link
Contributor

No description provided.

@VeronikaSolovei9 VeronikaSolovei9 marked this pull request as ready for review September 14, 2023 23:27
@hhhjort hhhjort requested a review from bsardo September 18, 2023 17:30
@bsardo bsardo requested a review from SyntaxNode September 19, 2023 17:38
@@ -177,48 +172,79 @@ func (rs *requestSplitter) cleanOpenRTBRequests(ctx context.Context,
}
}

ipConf := privacy.IPConf{IPV6: auctionReq.Account.Privacy.IPv6Config, IPV4: auctionReq.Account.Privacy.IPv4Config}

reqWrapper := cloneBidderReq(bidderRequest.BidRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good compromise. As a follow-up, we'll want to pass the request wrapper all the way down, but that's not possible right now since we lose it in getAuctionBidderRequests and that's too much work for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Better to refactor it separately.

//UFPD
privacy.ScrubDeviceIDs(reqWrapper)
privacy.ScrubUserIDs(reqWrapper)
privacy.ScrubUserExt(reqWrapper, "data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general approach, I recommend calling just one scrubber method and manage the groupings in the scrubber code. This could be ScrubUserFPD, which per the spec includes:

user.eids, user.ext.data.*, user.data.*, user.{id, buyeruid, yob, gender} and device-specific IDs

Looks like we're missing at least eids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Refactored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more unit tests for these cases

if !auctionPermissions.PassID {
privacy.ScrubDeviceIDs(reqWrapper)
privacy.ScrubUserDemographics(reqWrapper)
privacy.ScrubUserExt(reqWrapper, "eids")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider calling something like privacy.ScrubGdprID and move the grouping into scrubber code. Might we be good with calling ScrubUserFPD here instead? What would be the difference in information removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to the separate functions. It looks cleaner to me when we have separate function per activity in this case, rather than combining then into one, despite they are very similar to each other. I'm open to refactor it to one function.

var newReq *openrtb2.BidRequest
newReq = ptrutil.Clone(req)

if req.User != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ortb.CloneUser instead.

}
}

if req.Device != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ortb.CloneDevice and call it from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no CloneDevice and CloneSource functions. Is there a reason not to have them? Should I add them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone functions added

}
}

if req.Source != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ortb.CloneSource and call it from here.

Comment on lines +266 to +267
var newReq *openrtb2.BidRequest
newReq = ptrutil.Clone(req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these lines be rewritten as newReq := ptrutil.Clone(req)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be, yes, but IDE will miss the variable type (because it's generic after Clone) and underlines follow up code to red. We discussed it in one of the previous PRs. It's ok to leave the variable instantiation, it doesn't take memory and CPU and makes IDE happy.

Comment on lines 270 to 271
var userCopy *openrtb2.User
userCopy = ptrutil.Clone(req.User)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these lines be rewritten as userCopy := ptrutil.Clone(req.User). I see a similar pattern throughout this function as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten to userCopy := ortb.CloneUser(req.User)
I didn't know this function exists.

Comment on lines 111 to 127
func ScrubDeviceIP(reqWrapper *openrtb_ext.RequestWrapper, ipConf IPConf) {
if reqWrapper.Device != nil {
reqWrapper.Device.IP = scrubIP(reqWrapper.Device.IP, ipConf.IPV4.AnonKeepBits, iputil.IPv4BitSize)
reqWrapper.Device.IPv6 = scrubIP(reqWrapper.Device.IPv6, ipConf.IPV6.AnonKeepBits, iputil.IPv6BitSize)
}
}

userCopy := *user

if strategy == ScrubStrategyUserIDAndDemographic {
userCopy.BuyerUID = ""
userCopy.ID = ""
userCopy.Ext = scrubExtIDs(userCopy.Ext, "eids")
userCopy.Yob = 0
userCopy.Gender = ""
}
func ScrubDeviceIDsIPsUserDemoExt(reqWrapper *openrtb_ext.RequestWrapper, ipConf IPConf, fieldName string, scrubFullGeo bool) {
ScrubDeviceIDs(reqWrapper)
ScrubDeviceIP(reqWrapper, ipConf)
ScrubUserDemographics(reqWrapper)
ScrubUserExt(reqWrapper, fieldName)

switch geo {
case ScrubStrategyGeoFull:
userCopy.Geo = scrubGeoFull(user.Geo)
case ScrubStrategyGeoReducedPrecision:
userCopy.Geo = scrubGeoPrecision(user.Geo)
if scrubFullGeo {
ScrubGeoFull(reqWrapper)
} else {
ScrubGEO(reqWrapper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a lack of test coverage here. Is this out of the scope of this PR to update this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More unit tests are added. Consolidated scrubber functions like ScrubUserFPD are covered by integration tests in utils_test.go -> TestCleanOpenRTBRequestsActivities

@SyntaxNode
Copy link
Contributor

Prebid Server 2.0 has been released and Go Module name has changed from github.com/prebid/prebid-server to github.com/prebid/prebid-server/v2, per Go versioning conventions.

Please merge the master branch and update all prebid-server import statements to reference the v2 name change. Thank you.

# Conflicts:
#	exchange/utils.go
#	exchange/utils_test.go
#	privacy/enforcement.go
#	privacy/scrubber.go
Copy link
Contributor

@AlexBVolcy AlexBVolcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left test coverage and spelling comments

func (scrubber) ScrubUser(user *openrtb2.User, strategy ScrubStrategyUser, geo ScrubStrategyGeo) *openrtb2.User {
if user == nil {
return nil
func ScrubDeviceIP(reqWrapper *openrtb_ext.RequestWrapper, ipConf IPConf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests to cover this function?


userCopy := *user
func ScrubDeviceIDsIPsUserDemoExt(reqWrapper *openrtb_ext.RequestWrapper, ipConf IPConf, fieldName string, scrubFullGeo bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests to cover this function?

case ScrubStrategyGeoReducedPrecision:
userCopy.Geo = scrubGeoPrecision(user.Geo)
}
func ScrubUserFPD(reqWrapper *openrtb_ext.RequestWrapper) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests to cover this function?

Comment on lines 139 to 145
func ScrubGdprID(reqWrapper *openrtb_ext.RequestWrapper) {
ScrubDeviceIDs(reqWrapper)
ScrubUserDemographics(reqWrapper)
ScrubUserExt(reqWrapper, "eids")
}

func ScrubGeoAndDeviceIP(reqWrapper *openrtb_ext.RequestWrapper, ipConf IPConf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests to cover these functions?

ortb/clone.go Outdated
return &c
}

func CloseSupplyChainNodes(s []openrtb2.SupplyChainNode) []openrtb2.SupplyChainNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a type in this function name?

CloseSupplyChainNodes --> CloneSupplyChainNodes

@VeronikaSolovei9
Copy link
Contributor Author

These functions are covered from utils_test.go.
The reason for this is because they mostly have calls to other "Scrub" functions.
Utils_test.go also has tests where privacy and activities are set up in a way to execute and check logic in these functions.

AlexBVolcy
AlexBVolcy previously approved these changes Oct 26, 2023
@bsardo
Copy link
Collaborator

bsardo commented Oct 27, 2023

These functions are covered from utils_test.go. The reason for this is because they mostly have calls to other "Scrub" functions. Utils_test.go also has tests where privacy and activities are set up in a way to execute and check logic in these functions.

I generally agree with this approach. Even though these functions are public and I prefer to have unit tests for functions of this kind, most of the complex logic lives in private functions which are thoroughly tested in their own unit tests. There are also adequate tests in utils_test.go that ensure that these public functions are called and do what they are supposed to. One might argue that a simple unit test should be added for any function that has conditional logic just to ensure edge cases are handled (e.g. ScrubDeviceIP appropriately handles a nil Device) so that we don't have to scrutinize the integration tests in utils_test.go to make sure these edge cases are addressed, but rather can just focus on whether the right function was called.

delete(userExtParsed, "data")
userExtModified = true
}
func ScrubUserExt(reqWrapper *openrtb_ext.RequestWrapper, fieldName string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a private function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

if userExtModified {
userExt, _ := jsonutil.Marshal(userExtParsed)
userCopy.Ext = userExt
func ScrubEids(reqWrapper *openrtb_ext.RequestWrapper) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: ScrubEIDs
Edit: This function isn't used. Do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a function to support transmitEids activity, it will be added in the next PR.
Renamed

Comment on lines 95 to 96
if reqWrapper.Device != nil {
if reqWrapper.Device.Geo != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: flatten by combining these two if statements with &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, refactored

type scrubber struct {
ipV6 config.IPv6
ipV4 config.IPv4
func ScrubDeviceIDs(reqWrapper *openrtb_ext.RequestWrapper) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making some of these functions private that are meant to only be used within this package?

Copy link
Contributor Author

@VeronikaSolovei9 VeronikaSolovei9 Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. All functions that are used in this package only are now private

ortb/clone.go Outdated
Comment on lines 238 to 239
mobileCopy := &s.Mobile
c.Mobile = *mobileCopy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think it is more idiomatic to write this as such:

mobileCopy := *s.Mobile
c.Mobile = &mobileCopy

They should be functionally equivalent but typically when I see a variable named Copy I think it is the value that has been copied and not the address. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, modified, good point

TID: "Tid",
PChain: "PChain",
SChain: &openrtb2.SupplyChain{
Complete: 1, Nodes: []openrtb2.SupplyChainNode{{ASI: "asi", Ext: json.RawMessage(`{"anyField":1}`)}}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want the populated ext to be openrtb2.SupplyChain.Ext instead of openrtb2.SupplyChainNode.Ext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, refactored


t.Run("populated", func(t *testing.T) {
given := &openrtb2.SupplyChain{
Complete: 1, Nodes: []openrtb2.SupplyChainNode{{ASI: "asi", Ext: json.RawMessage(`{"anyField":1}`)}}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want the populated ext to be openrtb2.SupplyChain.Ext instead of openrtb2.SupplyChainNode.Ext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this change was made.

})
}

func TestCloseSupplyChainNodes(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: TestCloneSupplyChainNodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

assert.NotSame(t, given[1], result[1], "item1-pointer")
assert.NotSame(t, given[1].Ext, result[1].Ext, "item1-pointer-ext")
assert.NotSame(t, given[1].HP, result[1].HP, "item1-pointer-hp")
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a couple of asserts here:

assert.NotSame(t, given[2].Ext, result[2].Ext, "item2-pointer-ext")
assert.NotSame(t, given[2].HP, result[2].HP, "item2-pointer-hp")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added

return &dntCopy
}

func CloneUserAgent(s *openrtb2.UserAgent) *openrtb2.UserAgent {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a unit test for this function. It appears to be an oversight as I see all other functions have their own dedicated unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, good catch! Sorry I don't know why I didn't add this test.

expectedDevice: &openrtb2.Device{Geo: nil},
},
{
name: "with_user_geo",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "with_device_geo"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, renamed

assert.NotSame(t, given[0], result[0], "item0-pointer")
assert.NotSame(t, given[0].Ext, result[0].Ext, "item0-pointer-ext")
assert.NotSame(t, given[1], result[1], "item1-pointer")
assert.NotSame(t, given[1].Ext, result[1].Ext, "item1-pointer-ext")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed making this change here.


t.Run("populated", func(t *testing.T) {
given := &openrtb2.SupplyChain{
Complete: 1, Nodes: []openrtb2.SupplyChainNode{{ASI: "asi", Ext: json.RawMessage(`{"anyField":1}`)}}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this change was made.

Comment on lines 51 to 56
if !ac.Allow(privacy.ActivityTransmitUserFPD, component, privacy.ActivityRequest{}) {
// scrub UFPD from ao.RequestWrapper
}
if !ac.Allow(privacy.ActivityTransmitPreciseGeo, component, privacy.ActivityRequest{}) {
// scrub geo from ao.RequestWrapper
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to add this in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a part of a future PR remained in my local changes, yes, removed.

Comment on lines 672 to 674
assert.NotSame(t, given.Architecture, result.Architecture, "architecture")
assert.NotSame(t, given.Bitness, result.Bitness, "bitness")
assert.NotSame(t, given.Model, result.Model, "model")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Can we delete these three assert statements since these fields are just string values? I think we only need the NotSame asserts for pointers right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Yes non-pointers should be copied correctly as a part of shallow copy

assert.NotSame(t, given, result, "pointer")
assert.NotSame(t, given.SChain, result.SChain, "schain")
assert.NotSame(t, given.SChain.Ext, result.SChain.Ext, "schain.ext")
assert.NotSame(t, given.Ext, result.Ext, "ext")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add assert.NotSame(t, given.SChain.Nodes[0].Ext, result.SChain.Nodes[0].Ext, "schain.nodes.ext") here?

@bsardo bsardo merged commit 6a20c0e into master Nov 16, 2023
3 checks passed
@SyntaxNode SyntaxNode deleted the scrubber_refactoring branch November 17, 2023 22:13
svamiftah pushed a commit to sovrn/prebid-server that referenced this pull request Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants