Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Sep 25, 2024

What this PR does

Upgrade google.golang.org/grpc from v1.65.0 to v1.66.2. The performance regression referenced from dskit's reversal to v1.65.0 is fixed in v1.66.2, and the potential functional regression also referenced in said reversal is Loki specific.

What we found out is that v1.66 fixes a serious performance bottleneck in conjunction with compression; the decompress function is optimized through memory pooling, whereas v1.65.0 would use io.ReadAll and cause a lot of allocations during decompression. As a result, we saw ingester CPU usage increase by ~40% when enabling gRPC compression. The increase is now much more modest, maybe ~7% based on observations.

In order to keep allowing for unsafe references to unmarshalling buffers in e.g. LabelAdapter, I had to introduce a custom gRPC unmarshalling hook (mimirpb.CodecV2.Unmarshal) plus a scheme for protobuf messages to keep a reference to their unmarshalling buffer (mimirpb.UnmarshalerV2). The underlying reason is that from v1.66 on, gRPC immediately recycles each buffer after unmarshalling, unless a reference is kept. This causes data races with e.g. LabelAdapter taking unsafe string references to the unmarshal buffer, unless, as mentioned, a buffer reference is kept with the root protobuf message.

Ideally, one should call FreeBuffer on protobuf messages keeping an unmarshalling buffer reference so the buffer can be given back to the gRPC pool. If this isn't done though, there also shouldn't be a memory leak since the buffer should be garbage collected.

Which issue(s) this PR fixes or relates to

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@aknuds1 aknuds1 marked this pull request as ready for review September 25, 2024 07:54
@aknuds1 aknuds1 requested review from stevesg, grafanabot and a team as code owners September 25, 2024 07:54
@aknuds1 aknuds1 marked this pull request as draft September 25, 2024 07:54
@aknuds1 aknuds1 marked this pull request as ready for review September 25, 2024 07:56
@aknuds1
Copy link
Contributor Author

aknuds1 commented Sep 25, 2024

Some tests are breaking for this upgrade, have to fix them.

@aknuds1 aknuds1 marked this pull request as draft September 25, 2024 08:17
@aknuds1 aknuds1 changed the title Upgrade to google.golang.org/grpc v1.66.2 WIP: Upgrade to google.golang.org/grpc v1.66.2 Sep 25, 2024
@aknuds1
Copy link
Contributor Author

aknuds1 commented Sep 25, 2024

Tests currently pass, but that is through undoing some memory unsafe optimizations in the mimirpb package. We want to investigate whether the optimizations can safely be retained.

Edit: Memory unsafe optimizations are back again :) Solved by letting relevant protobuf messages keep an unmarshalling buffer reference.

@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 16 times, most recently from d79ea9f to f484e3a Compare September 30, 2024 11:45
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 8 times, most recently from 833e5e6 to a38a9e5 Compare October 8, 2024 06:59
@aknuds1 aknuds1 changed the title WIP: Upgrade to google.golang.org/grpc v1.66.2 Upgrade to google.golang.org/grpc v1.66.2 Oct 8, 2024
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 2 times, most recently from f574a11 to 3e5734b Compare October 8, 2024 07:16
@aknuds1 aknuds1 requested a review from pstibrany October 8, 2024 07:18
@aknuds1 aknuds1 marked this pull request as ready for review October 8, 2024 07:18
@aknuds1 aknuds1 requested a review from a team as a code owner October 8, 2024 07:18
@aknuds1 aknuds1 requested a review from pracucci October 8, 2024 08:00
pkg/mimirpb/custom.go Show resolved Hide resolved
@@ -166,7 +166,7 @@ func TestSplitWriteRequestByMaxMarshalSize_WriteRequestHasChanged(t *testing.T)

// If the fields of WriteRequest have changed, then you will probably need to modify
// the SplitWriteRequestByMaxMarshalSize() implementation accordingly!
assert.ElementsMatch(t, []string{"Timeseries", "Source", "Metadata", "SkipLabelValidation", "skipUnmarshalingExemplars"}, fieldNames)
assert.ElementsMatch(t, []string{"Timeseries", "Source", "Metadata", "SkipLabelValidation", "skipUnmarshalingExemplars", "buffer"}, fieldNames)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most correct thing to do in SplitWriteRequestByMaxMarshalSize() is to retain a reference of the original buffer in each split WriteRequest (and by calling bufferRef() for each split WriteRequest). We could also mention it in the SplitWriteRequestByMaxMarshalSize() function doc.

Then where we call SplitWriteRequestByMaxMarshalSize() (which is marshalWriteRequestToRecords()) we should call the FreeBuffer() on each split request once we've done marshalling it.

The only annoying thing of this approach is that SplitWriteRequestByMaxMarshalSize() will just return the input WriteRequest is not splitting happened at all. We could slightly modify it to also return a bool to indicate if splitting happened, so that the caller will call FreeBuffer() only if splitting happened.

Copy link
Contributor

@colega colega Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree on this. These are internal things we do with the WriteRequest, they shouldn't affect the outer lifecycle: we had a WriteRequest, we call FreeBuffer after using it (and using it here implies splitting it into multiple ones).

Maybe we should clarify in the comment of SplitWriteRequestByMaxMarshalSize that the requests returned in the slice might still retain references to the original WriteRequest.

What doesn't convince me is that here for some reason we would do a special thing: we sould retain more references to that buffer and free them separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you Oleg, and suggested adding the comment instead on internal slack thread as well.

pkg/distributor/query.go Outdated Show resolved Hide resolved
pkg/mimirpb/custom.go Outdated Show resolved Hide resolved
Comment on lines 23 to 24
// CodecV2 customizes gRPC unmarshalling.
type CodecV2 struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this codec to reflect it's purpose, something like: BufferHolderCodec

Copy link
Contributor Author

@aknuds1 aknuds1 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name CodecV2 makes complete sense, since it wraps the codecV2 type in grpc/encoding/proto and it implements the grpc/encoding.CodecV2 interface. My reasoning is that if the naming makes sense for those, the same holds for our wrapper type.

Copy link
Contributor

@colega colega Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, our codec, that does special things, should reflect in it's name what's the special thing it's doing. Their codecV2 name is because it's the default implementation.

BTW, do we need ours to be exported? Edit: I see you have unexported it.

To be clear: this is opinionated, not a blocking comment.


// CodecV2 customizes gRPC unmarshalling.
type CodecV2 struct {
encoding.CodecV2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not embed this here, but convert it to a field and implement the two methods missing.

Otherwise it's unclear what are the other things that we're inheriting from the orignal codec: for example here this means that our codec has the same Name() as the parent: is that ok?

Copy link
Contributor Author

@aknuds1 aknuds1 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely OK that Name() is the same as encoding.CodecV2, as it should replace it:

func init() {
	c := encoding.GetCodecV2(proto.Name)
	encoding.RegisterCodecV2(&CodecV2{c})
}

I think that embedding is good, because our implementation is supposed to only customize the Unmarshal method of the standard CodecV2.

If this case isn't right for struct embedding, then I wonder which is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that embedding is good, because our implementation is supposed to only customize the Unmarshal method of the standard CodecV2.

We can still "forward" implementation of other methods to original implementation, if we use named field. It would just be more explicit.

Copy link
Contributor Author

@aknuds1 aknuds1 Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, it's possible to do the same without embedding. Help me understand though, why should we not use the embedding technique in this case? Are you against it in principle? If embedding is not the right technique in this case, why?

Copy link
Contributor

@colega colega Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this case isn't right for struct embedding, then I wonder which is?

I think struct embedding is valid in codebases that respect backwards incompatible changes. Given that the grpc package makes breaking changes in minor versions, it would be very useful, IMO, to detect that they suddenly added a new method to the CodecV2 interface that changes the behaviour in some drastic way.

We can still "forward" implementation of other methods to original implementation, if we use named field. It would just be more explicit.

Yes, that's what I meant.

type CodecV2 struct {
    wrapped encoding.CodecV2
}

func (c *CodecV2) Name() string { return c.wrapped.Name() }

Etc.

Edit: to be clear, it's my opinion, not a blocking comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not use embedding. (Personally I think it's a good practice to avoid embedding by default, since it hides incompatible changes, and it often exposes more methods to "outer" type than expected or necessary. Typical misuse of embedding is embedding sync.Lock. This second mentioned issue is not the case in this specific instance.)

@aknuds1 aknuds1 changed the title Upgrade to google.golang.org/grpc v1.66.2 Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants