Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race during Stream.MsgRecv vs drpcmanager.NewWithOptions #48

Closed
tomyl opened this issue Feb 20, 2023 · 4 comments
Closed

Data race during Stream.MsgRecv vs drpcmanager.NewWithOptions #48

tomyl opened this issue Feb 20, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@tomyl
Copy link

tomyl commented Feb 20, 2023

Hi,

I'm seeing the following data race (traces slightly redacted):

==================
WARNING: DATA RACE
Write at 0x00c000f20000 by goroutine 121347:
  runtime.slicecopy()
      /home/user/sdk/go/src/runtime/slice.go:310 +0x0
  storj.io/drpc/drpcwire.(*Reader).ReadPacketUsing()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcwire/reader.go:143 +0x528
  storj.io/drpc/drpcmanager.(*Manager).manageReader()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:228 +0x184
  storj.io/drpc/drpcmanager.NewWithOptions.func1()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:115 +0x39

Previous read at 0x00c000f20000 by goroutine 128191:
  google.golang.org/protobuf/internal/impl.(*MessageInfo).unmarshalPointer()
      /home/user/go/pkg/mod/google.golang.org/[email protected]/internal/impl/decode.go:104 +0x177
  google.golang.org/protobuf/internal/impl.(*MessageInfo).unmarshal()
      /home/user/go/pkg/mod/google.golang.org/[email protected]/internal/impl/decode.go:66 +0xdb
  google.golang.org/protobuf/internal/impl.(*MessageInfo).unmarshal-fm()
      <autogenerated>:1 +0xce
  google.golang.org/protobuf/proto.UnmarshalOptions.unmarshal()
      /home/user/go/pkg/mod/google.golang.org/[email protected]/proto/decode.go:105 +0x2f1
  google.golang.org/protobuf/proto.Unmarshal()
      /home/user/go/pkg/mod/google.golang.org/[email protected]/proto/decode.go:55 +0xc8
  gitlab.com/redacted/redacted/pkg/cache/pb.drpcEncoding_File_grpc_proto.Unmarshal()
      /home/user/git/redacted/pkg/cache/pb/grpc_drpc.pb.go:27 +0x91
  gitlab.com/redacted/redacted/pkg/cache/pb.(*drpcEncoding_File_grpc_proto).Unmarshal()
      <autogenerated>:1 +0x2e
  storj.io/drpc/drpcstream.(*Stream).MsgRecv()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcstream/stream.go:444 +0x8f
  storj.io/drpc/drpcpool.(*streamWrapper).MsgRecv()
      <autogenerated>:1 +0x81
  gitlab.com/redacted/redacted/pkg/cache/pb.(*drpcGrpc_ReadShardClient).Recv()
      /home/user/git/redacted/pkg/cache/pb/grpc_drpc.pb.go:102 +0x75
  ...

Goroutine 121347 (running) created at:
  storj.io/drpc/drpcmanager.NewWithOptions()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:115 +0x7ae
  storj.io/drpc/drpcconn.NewWithOptions()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcconn/conn.go:46 +0x3a4
  storj.io/drpc/drpcconn.New()
      /home/user/go/pkg/mod/storj.io/[email protected]/drpcconn/conn.go:38 +0x347
  gitlab.com/redacted/redacted/pkg/cache/backends/drpc.(*Backend).newConn()
      /home/user/git/redacted/pkg/cache/backends/drpc/backend.go:345 +0xce
  ...

Goroutine 128191 (running) created at:
  gitlab.com/redacted/redacted/pkg/cache.(*Storage).readObject()
      /home/user/git/redacted/pkg/cache/storage.go:198 +0x2fd
==================

The binary is built with go install -race -gcflags=-l against commit
220d855. I have seen this data race with older versions, just never got around to report it.

The data race appears to be triggered when new connections are created while MsgRecv is running.

The data race is observed in the same system as in #37 where there are frequent context cancellations during read operations. If I pass in the background context instead, the data races go away.

Unfortunately I don't have a minimal code example that reproduces it. I could look into to providing that if it's not obvious what the problem is by just eyeing the traces above.

Thanks,
Tommy

@zeebo zeebo self-assigned this Feb 20, 2023
@zeebo zeebo added the bug Something isn't working label Feb 20, 2023
@zeebo
Copy link
Collaborator

zeebo commented Feb 20, 2023

Thanks for the report. I'm surprised there's a data race here because all of the tests both for drpc and all of the storj code we have run with -race. Looking at the traces, it appears to be related to how drpc reuses some buffers, but I don't see how it's possible for the ReadPacketUsing to be concurrent with the Unmarshal. There's a packetBuffer type that coordinates access to it.

I guess that's all to say that anything you can provide to help with reproducing would be appreciated. I'll continue to look in the code for anything I missed.

@zeebo
Copy link
Collaborator

zeebo commented Feb 20, 2023

Ok, I managed to reproduce it if a stream is canceled at an appropriate time. I'll work on a fix now. Thanks again for the report.

@storj-gerrit
Copy link

storj-gerrit bot commented Feb 20, 2023

Change https://review.dev.storj.io/c/storj/drpc/+/9693 mentions this issue.

@tomyl
Copy link
Author

tomyl commented Feb 21, 2023

Confirming that I'm not able to reproduce the data race with 89d4b63. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants