Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

aknuds1 · 2024-09-25T07:47:08Z

What this PR does

Upgrade google.golang.org/grpc from v1.65.0 to v1.66.2. The performance regression referenced from dskit's reversal to v1.65.0 is fixed in v1.66.2, and the potential functional regression also referenced in said reversal is Loki specific.

What we found out is that v1.66 fixes a serious performance bottleneck in conjunction with compression; the decompress function is optimized through memory pooling, whereas v1.65.0 would use io.ReadAll and cause a lot of allocations during decompression. As a result, we saw ingester CPU usage increase by ~40% when enabling gRPC compression. The increase is now much more modest, maybe ~7% based on observations.

In order to keep allowing for unsafe references to unmarshalling buffers in e.g. LabelAdapter, I had to introduce a custom gRPC unmarshalling hook (mimirpb.CodecV2.Unmarshal) plus a scheme for protobuf messages to keep a reference to their unmarshalling buffer (mimirpb.UnmarshalerV2). The underlying reason is that from v1.66 on, gRPC immediately recycles each buffer after unmarshalling, unless a reference is kept. This causes data races with e.g. LabelAdapter taking unsafe string references to the unmarshal buffer, unless, as mentioned, a buffer reference is kept with the root protobuf message.

Ideally, one should call FreeBuffer on protobuf messages keeping an unmarshalling buffer reference so the buffer can be given back to the gRPC pool. If this isn't done though, there also shouldn't be a memory leak since the buffer should be garbage collected.

Which issue(s) this PR fixes or relates to

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

aknuds1 · 2024-09-25T08:16:56Z

Some tests are breaking for this upgrade, have to fix them.

aknuds1 · 2024-09-25T13:31:56Z

~~Tests currently pass, but that is through undoing some memory unsafe optimizations in the mimirpb package. We want to investigate whether the optimizations can safely be retained.~~

Edit: Memory unsafe optimizations are back again :) Solved by letting relevant protobuf messages keep an unmarshalling buffer reference.

Signed-off-by: Arve Knudsen <[email protected]>

pkg/mimirpb/custom.go

pracucci · 2024-10-09T09:22:06Z

pkg/mimirpb/split_test.go

@@ -166,7 +166,7 @@ func TestSplitWriteRequestByMaxMarshalSize_WriteRequestHasChanged(t *testing.T)

 	// If the fields of WriteRequest have changed, then you will probably need to modify
 	// the SplitWriteRequestByMaxMarshalSize() implementation accordingly!
-	assert.ElementsMatch(t, []string{"Timeseries", "Source", "Metadata", "SkipLabelValidation", "skipUnmarshalingExemplars"}, fieldNames)
+	assert.ElementsMatch(t, []string{"Timeseries", "Source", "Metadata", "SkipLabelValidation", "skipUnmarshalingExemplars", "buffer"}, fieldNames)


I think the most correct thing to do in SplitWriteRequestByMaxMarshalSize() is to retain a reference of the original buffer in each split WriteRequest (and by calling bufferRef() for each split WriteRequest). We could also mention it in the SplitWriteRequestByMaxMarshalSize() function doc.

Then where we call SplitWriteRequestByMaxMarshalSize() (which is marshalWriteRequestToRecords()) we should call the FreeBuffer() on each split request once we've done marshalling it.

The only annoying thing of this approach is that SplitWriteRequestByMaxMarshalSize() will just return the input WriteRequest is not splitting happened at all. We could slightly modify it to also return a bool to indicate if splitting happened, so that the caller will call FreeBuffer() only if splitting happened.

I'm not sure I agree on this. These are internal things we do with the WriteRequest, they shouldn't affect the outer lifecycle: we had a WriteRequest, we call FreeBuffer after using it (and using it here implies splitting it into multiple ones).

Maybe we should clarify in the comment of SplitWriteRequestByMaxMarshalSize that the requests returned in the slice might still retain references to the original WriteRequest.

What doesn't convince me is that here for some reason we would do a special thing: we sould retain more references to that buffer and free them separately.

I agree with you Oleg, and suggested adding the comment instead on internal slack thread as well.

pkg/distributor/query.go

pkg/mimirpb/custom.go

colega · 2024-10-09T15:19:15Z

pkg/mimirpb/custom.go

+// CodecV2 customizes gRPC unmarshalling.
+type CodecV2 struct {


I would rename this codec to reflect it's purpose, something like: BufferHolderCodec

I think the name CodecV2 makes complete sense, since it wraps the codecV2 type in grpc/encoding/proto and it implements the grpc/encoding.CodecV2 interface. My reasoning is that if the naming makes sense for those, the same holds for our wrapper type.

IMO, our codec, that does special things, should reflect in it's name what's the special thing it's doing. Their codecV2 name is because it's the default implementation.

BTW, do we need ours to be exported? Edit: I see you have unexported it.

To be clear: this is opinionated, not a blocking comment.

colega · 2024-10-09T15:23:06Z

pkg/mimirpb/custom.go

+
+// CodecV2 customizes gRPC unmarshalling.
+type CodecV2 struct {
+	encoding.CodecV2


I would not embed this here, but convert it to a field and implement the two methods missing.

Otherwise it's unclear what are the other things that we're inheriting from the orignal codec: for example here this means that our codec has the same Name() as the parent: is that ok?

It's definitely OK that Name() is the same as encoding.CodecV2, as it should replace it:

func init() { c := encoding.GetCodecV2(proto.Name) encoding.RegisterCodecV2(&CodecV2{c}) }

I think that embedding is good, because our implementation is supposed to only customize the Unmarshal method of the standard CodecV2.

If this case isn't right for struct embedding, then I wonder which is?

I think that embedding is good, because our implementation is supposed to only customize the Unmarshal method of the standard CodecV2.

We can still "forward" implementation of other methods to original implementation, if we use named field. It would just be more explicit.

Yes of course, it's possible to do the same without embedding. Help me understand though, why should we not use the embedding technique in this case? Are you against it in principle? If embedding is not the right technique in this case, why?

If this case isn't right for struct embedding, then I wonder which is?

I think struct embedding is valid in codebases that respect backwards incompatible changes. Given that the grpc package makes breaking changes in minor versions, it would be very useful, IMO, to detect that they suddenly added a new method to the CodecV2 interface that changes the behaviour in some drastic way.

We can still "forward" implementation of other methods to original implementation, if we use named field. It would just be more explicit.

Yes, that's what I meant.

type CodecV2 struct { wrapped encoding.CodecV2 } func (c *CodecV2) Name() string { return c.wrapped.Name() }

Etc.

Edit: to be clear, it's my opinion, not a blocking comment.

I agree that we should not use embedding. (Personally I think it's a good practice to avoid embedding by default, since it hides incompatible changes, and it often exposes more methods to "outer" type than expected or necessary. Typical misuse of embedding is embedding sync.Lock. This second mentioned issue is not the case in this specific instance.)

Co-authored-by: Oleg Zaytsev <[email protected]>

Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 added the area/grpc label Sep 25, 2024

aknuds1 requested a review from bboreham September 25, 2024 07:48

aknuds1 marked this pull request as ready for review September 25, 2024 07:54

aknuds1 requested review from stevesg, grafanabot and a team as code owners September 25, 2024 07:54

aknuds1 marked this pull request as draft September 25, 2024 07:54

aknuds1 force-pushed the arve/upgrade-grpc branch from 06c9166 to c6b77ea Compare September 25, 2024 07:55

aknuds1 marked this pull request as ready for review September 25, 2024 07:56

aknuds1 marked this pull request as draft September 25, 2024 08:17

aknuds1 force-pushed the arve/upgrade-grpc branch from 7ed1789 to b61cbbd Compare September 25, 2024 10:57

aknuds1 changed the title ~~Upgrade to google.golang.org/grpc v1.66.2~~ WIP: Upgrade to google.golang.org/grpc v1.66.2 Sep 25, 2024

aknuds1 force-pushed the arve/upgrade-grpc branch 16 times, most recently from d79ea9f to f484e3a Compare September 30, 2024 11:45

aknuds1 force-pushed the arve/upgrade-grpc branch 8 times, most recently from 833e5e6 to a38a9e5 Compare October 8, 2024 06:59

aknuds1 changed the title ~~WIP: Upgrade to google.golang.org/grpc v1.66.2~~ Upgrade to google.golang.org/grpc v1.66.2 Oct 8, 2024

aknuds1 force-pushed the arve/upgrade-grpc branch 2 times, most recently from f574a11 to 3e5734b Compare October 8, 2024 07:16

aknuds1 requested a review from pstibrany October 8, 2024 07:18

aknuds1 marked this pull request as ready for review October 8, 2024 07:18

aknuds1 requested a review from a team as a code owner October 8, 2024 07:18

aknuds1 force-pushed the arve/upgrade-grpc branch from 3e5734b to f620574 Compare October 8, 2024 07:52

aknuds1 requested a review from pracucci October 8, 2024 08:00

Upgrade to google.golang.org/grpc v1.66.2

fcd7178

Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 force-pushed the arve/upgrade-grpc branch from f620574 to fcd7178 Compare October 8, 2024 09:34

pracucci reviewed Oct 9, 2024

View reviewed changes

colega reviewed Oct 9, 2024

View reviewed changes

aknuds1 and others added 9 commits October 9, 2024 18:03

Update pkg/distributor/query.go

1b5dbe6

Co-authored-by: Oleg Zaytsev <[email protected]>

Rename UnmarshalerV2 to BufferHolder

3d135b5

Signed-off-by: Arve Knudsen <[email protected]>

Merge remote-tracking branch 'origin/main' into arve/upgrade-grpc

b89a775

Signed-off-by: Arve Knudsen <[email protected]>

Address reviewer feedback

edd0860

Signed-off-by: Arve Knudsen <[email protected]>

Merge remote-tracking branch 'origin/main' into arve/upgrade-grpc

8a6f93f

Unexport CodecV2

7540891

Signed-off-by: Arve Knudsen <[email protected]>

Merge remote-tracking branch 'origin/main' into arve/upgrade-grpc

017265d

Signed-off-by: Arve Knudsen <[email protected]>

Merge remote-tracking branch 'origin/main' into arve/upgrade-grpc

39fbe8b

Signed-off-by: Arve Knudsen <[email protected]>

Merge remote-tracking branch 'origin/main' into arve/upgrade-grpc

6c9b586

aknuds1 changed the title ~~Upgrade to google.golang.org/grpc v1.66.2~~ Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

aknuds1 commented Sep 25, 2024 •

edited

Loading

aknuds1 commented Sep 25, 2024

aknuds1 commented Sep 25, 2024 •

edited

Loading

pracucci Oct 9, 2024

colega Oct 10, 2024 •

edited

Loading

pstibrany Oct 10, 2024

colega Oct 9, 2024

aknuds1 Oct 9, 2024 •

edited

Loading

colega Oct 10, 2024 •

edited

Loading

colega Oct 9, 2024

aknuds1 Oct 9, 2024 •

edited

Loading

pstibrany Oct 10, 2024

aknuds1 Oct 10, 2024 •

edited

Loading

colega Oct 10, 2024 •

edited

Loading

pstibrany Oct 10, 2024

		// CodecV2 customizes gRPC unmarshalling.
		type CodecV2 struct {

Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

Are you sure you want to change the base?

Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

Conversation

aknuds1 commented Sep 25, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

aknuds1 commented Sep 25, 2024

aknuds1 commented Sep 25, 2024 • edited Loading

pracucci Oct 9, 2024

Choose a reason for hiding this comment

colega Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

pstibrany Oct 10, 2024

Choose a reason for hiding this comment

colega Oct 9, 2024

Choose a reason for hiding this comment

aknuds1 Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

colega Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

colega Oct 9, 2024

Choose a reason for hiding this comment

aknuds1 Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

pstibrany Oct 10, 2024

Choose a reason for hiding this comment

aknuds1 Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

colega Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

pstibrany Oct 10, 2024

Choose a reason for hiding this comment

aknuds1 commented Sep 25, 2024 •

edited

Loading

aknuds1 commented Sep 25, 2024 •

edited

Loading

colega Oct 10, 2024 •

edited

Loading

aknuds1 Oct 9, 2024 •

edited

Loading

colega Oct 10, 2024 •

edited

Loading

aknuds1 Oct 9, 2024 •

edited

Loading

aknuds1 Oct 10, 2024 •

edited

Loading

colega Oct 10, 2024 •

edited

Loading