-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(config): pre-flight work for initial integration of configuration schema for sinks #13516
Conversation
Signed-off-by: Toby Lawrence <[email protected]>
Signed-off-by: Toby Lawrence <[email protected]>
Signed-off-by: Toby Lawrence <[email protected]>
Signed-off-by: Toby Lawrence <[email protected]>
✅ Deploy Preview for vector-project canceled.
|
Signed-off-by: Toby Lawrence <[email protected]>
Soak Test ResultsBaseline: 329c410 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
Signed-off-by: Toby Lawrence <[email protected]>
Signed-off-by: Toby Lawrence <[email protected]>
Signed-off-by: Toby Lawrence <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I generally understood the changes (though doubt I could have implemented/written them). Left a few comments and thoughts, definitely would want a review of someone more familiar with the encoders and with macros.
Soak Test ResultsBaseline: 176060c ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
Signed-off-by: Toby Lawrence <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely feels like magic to me still, but it's well documented, organized, and explained. Appreciate the thought and work that's gone into all these PRs.
Soak Test ResultsBaseline: 176060c ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. Changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:
Fine details of change detection per experiment.
|
T::metadata().map_default_value(|default| Some(default)) | ||
T::metadata().convert() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this because what this was doing was always setting a default value for an optional field based on the default value of its contained T
.
That isn't what the default of an Option<T>
would be if you just did Default::default()
, so the previous code was not correct in that case. Callers can still specify a default for an optional field at the field-level if need be, but we shouldn't add one by default via T
.
let mut inner_metadata = T::metadata(); | ||
inner_metadata.set_transparent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules are slightly complex, but the process of generating the schema for a given type happens in two parts: generating the base level schema, and then applying the metadata which is where we set default values, validation, etc.
Similar to the comments above about not using the default value of T
for Option<T>
, we're handling the reverse case here. The previous code was taking the metadata for the Option<T>
-- this will always be an actual field in a struct/enum -- which may have contained a default value, and then using that when generating the schema for T
.
If someone has a Option<T>
field with a default value, it doesn't make sense to generate the schema for T
while saying the default value for it is whatever was on that optional field... because there might be 1 or 100 other fields that are also Option<T>
, with different defaults, and now our schema is implying that T
by itself always has a default of whatever the first Option<T>
field we generated a schema for had for it's default value...
I feel like if I say Option<T>
or "default value" one more time my brain is gonna explode, but I think you get the point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This added configurable
to some sinks, but not all - that's coming in future work still?
Soak Test ResultsBaseline: ab1d123 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. Changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:
Fine details of change detection per experiment.
|
Signed-off-by: Toby Lawrence <[email protected]>
@spencergilbert Right, I did it for a single sink to try and bring a bunch of the common configuration types -- encoding, acks, batch settings, etc -- to the surface in terms of instrumenting them and trying to find gaps in what |
Good good - just covering the obvious base of "oops I forgot to update 20+ sinks" 😆 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments, this PR looks good! Nice work, excited to eventually see these comments end up on the website docs 😄
Signed-off-by: Toby Lawrence <[email protected]>
Soak Test ResultsBaseline: e8f32c3 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
Signed-off-by: Toby Lawrence <[email protected]>
3718b2c
to
9e0ce4d
Compare
Soak Test ResultsBaseline: b5906e3 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
(part of the work required for #12223)
This PR is the pre-flight work for initial implementation of the configuration schema framework, specifically for sinks.
It does not change how we load configurations, or anything about our use of
inventory
/typetag
/SourceOuter
/etc. That work will be the final step after all components have been minimally instrumented/converted to support having their configuration schema generated.Reviewer Notes
Changes to
Configurable
The main change in this PR is the dropping of most bounds, and the lifetime parameter, on
Configurable
. In the course of starting to apply#[configurable_component]
to sink-related types, I hit many issues with types that had their own implementations ofDeserialize
/Serialize
, as well as generic type parameters. This was due to the tight bounds present on theConfigurable
trait itself, which necessitated an infectious approach to propagating constraints likewhere T: Configurable
to type definitions themselves, which then must be propagated to all other usages i.e. adding the bounds to unrelated trait implementations.As well as removing these bounds and lifetime parameters, we've moved most of the bounds to the generated
Configurable
implementation itself, which lets us shift the burden of types needing to beConfigurable
throughout from the declaration site to the various callsites themselves. We've additionally added the capability to avoid automatically deriving theserde
traits, as many of our configuration helper types have custom (de)serialization logic.Changes to
Compression
This is sort of an inconsequential change, but represents a big chunk of changed lines. Essentially, related to the above, where we had to do a bunch of rigamarole to get
Compression
into a state where we could successfully deriveConfigurable
.