Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Distributor usage trackers #4162

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Oct 7, 2024

What this PR does:
This a new feature that allows a tenant to accurately track the amount of ingested traffic by a set of custom labels. It's similar to the existing traces_spanmetrics_size_total metric created by the generators but improves on it in some key ways.

** Need **
The core need is to export a set of highly accurate metrics on ingested traffic that tenants can use for cost-attribution. This means that every ingested byte can be attributed to something, i.e. a team or department, so that tracing costs can be reconciled. Any attribute in the tracing data can be used.

** Reasons for a new feature**:

  1. The existing size metric isn't accurate enough. It doesn't include non-span data (i.e. resources and scopes). This can be significant, typically 15+% of the total payload. However it can also not be fixed because of the way input data is sharded across the generator rings. Each time a batch is split by trace ID, the non-span data is duplicated (the resource-level information is duplicated for each generator target in order to ensure an internally-consistent payload). Trying to account for it then errs on the other side and over-counts the non-span data (85% -> 115%). Therefore we needed a new approach which is 99+% accurate. The only component in Tempo which has the original payload is the distributor so it is the ideal location to add this functionality.
  2. The labels for usage tracking need to be separately configurable from span metrics. Span metrics typically includes labels such as http url or status code, span success/failure, database targets. This level of detail is fine-grained and geared towards operational needs, which is separate from cost-attribution and cost reconciliation.

** Important concepts about this new feature **

  1. This lays the foundation in the distributor for generic trackers in the future. The only one now is cost-attribution, and is controlled by per-tenant overrides. Examples of other trackers are helpful things like tracking the adoption of instrumentation libraries, databases, etc.
  2. The trackers are exposed on a new endpoint /usage_metrics. This is so that they aren't mixed with the existing operational /metrics, because they are expected to have much higher cardinality and a different purpose.
  3. Significant work went into the algorithm for measuring non-span data correctly and fairly. Tracing payloads are composed of a batch with resource attributes and many spans with their own attributes (simplifying greatly here). A span is always matched to a single category, but the non-span data cannot (the ~15% of data). Therefore we split it proportionally based on the assignment of spans. Example If the batch contains 10 spans with "foo", and 5 spans with "bar", then the category "foo" will get 67% of the non-span bytes, and "bar" will get 33%.
  4. This adds overhead to the write-path of the distributor. But it should be minimal. It is effectively a single additional call to proto.Size(), and the series tracking contains buffering that enables it to be zero-allocation for existing series. If you run the benchmark you will see:
    BenchmarkUsageTracker 1995282 5904 ns/op 0 B/op 0 allocs/op

TODOs
There is some remaining questions:

  1. Behavior for unconfigured tenants. Currently it's opt-in: if overrides.cost_attribution.dimensions isn't set then the distributor records nothing. But maybe it makes sense to have default behavior here? Service name may be a good default.
  2. Behavior when hitting against max cardinality. Currently if we already have the max number of series, it records the data into the unlabeled series. But the existing series keep working. This means that it is not possible to assess the data quality after max cardinality is reached. Thinking about alternative behaviors here.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@knylander-grafana
Copy link
Contributor

Should we create a doc issue for this when it's ready?

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core code looks great. have some Qs but mainly just needs changelog and docs

modules/distributor/distributor.go Outdated Show resolved Hide resolved
cmd/tempo/app/modules.go Outdated Show resolved Hide resolved
modules/distributor/usage/config.go Outdated Show resolved Hide resolved
modules/distributor/usage/tracker.go Show resolved Hide resolved
modules/distributor/usage/tracker.go Outdated Show resolved Hide resolved
modules/distributor/usage/tracker.go Outdated Show resolved Hide resolved
if len(t.series) >= maxCardinality {
// Overflow
// It goes into the unlabeled bucket
// TODO - Do we want to do something else?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really like this solution and can't come up with a better one off the top of my head. you can clearly see the severity of the unknown data and choose whether or not to pick a different label set.

if v == "" {
continue
}
for i, d := range dimensions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it looks like span data "overrides" resource level data. should we require scopes on the configured dimensions to enforce level?

I'm imagining a scenario like:

batches: [
  atts: {
    "foo": "A",
  },
  spans: [
    { 
      atts: {
        "foo": "B",
      },
    },
    {
       // no value for foo
    },
  ]
]

will the first span overwrite the foo value to be "B" and that will carry forward to the second span even though it doesn't have a value for "foo"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely considered this, but for this use case we need to check both areas, as different applications can be instrumented differently (service A puts the attribute on the resource, and service B puts the attribute on the span). Was thinking we could evolve into that in the future, parsing as a traceql identifier allows us to know if it's a plain old label (current), or scoped (span.foo) or even unscoped again (.foo).

Another consideration was to keep the configuration simple, and consistent across signals. For example here is the PR for Mimir: grafana/mimir#9392

Ultimately I think at this time I wouldn't want to require scopes yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry reading closer, yes I think you are right and that is a bug. It looks like "B" would carry forward to the second span.

dimensions := u.labelsFn(tenant)
if len(dimensions) == 0 {
// Not configured
// TODO - Should we put it all in the unattributed bucket instead?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean on the side of defaulting to service name, that seems more useful then the unattributed bucket 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever is decided here, it should be the same for all our DBs.

modules/overrides/config.go Show resolved Hide resolved
modules/overrides/config_legacy.go Show resolved Hide resolved
modules/overrides/interface.go Show resolved Hide resolved
modules/overrides/user_configurable_overrides.go Outdated Show resolved Hide resolved
@mdisibio
Copy link
Contributor Author

mdisibio commented Oct 9, 2024

Should we create a doc issue for this when it's ready?

I'm happy to add docs in this PR if we want. At the minimum I will update the config and url sections.

@mdisibio
Copy link
Contributor Author

Pushed some changes to configure via map[string]string] instead of []string, to handle the case of many-to-one dimensions with relabel. This allows for the flexibility to accommodate gaps across a tenant's data. For example we could now scan for both labels k8s.namespace.name and app_namespace, and combine both into a final namespace label. There are still some outstanding changes TODO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants