I unblocked a team who needed to tag DataDog metrics with ownership information. The team's naive implementation relied on acquiring stack traces to calculate the metric owner and was estimated in increasing site latency by 30x. Metrics emission must be much faster than the code it measures, so I advised against tagging at the source and instead do it as a post-processing step, but they needed it for integration with analysis tools. I helped them optimise their solution and we arrived at a seemingly satisfactory proof of concept, but I warned about the risks of such change at scale. The experience helped the team understand the risks and articulate up that it was best to compromise on tool integration rather than site availability.
By the end of last year I assisted a team who needed to tag DataDog metrics with ownership information. What seemed like a simple task turned out to require some serious performance consideration. The task was a priority and the team was under pressure to deliver.
They started with a straightforward but naive implementation, decorating the dogstatsd-ruby gem, DataDog's client library, to obtain a stack trace, calculate the code owner of the caller, and inject the tag on metric submission. Their code looked like this:
class AttributedDogstats < Datadog::Statsd
def increment(metric, opts)
stack = caller()
owner = calculate_owner(stack)
if opts[:tags].is_a?(Hash)
opts[:tags]["owner"] = owner
else
opts[:tags] ||= []
opts[:tags].reject { |t| t.start_with?("owner:") }
opts[:tags] << "owner:#{owner}"
end
super
end
#...
end
I saw this patch fly by and immediately intervened.
Metrics emission must be fast. Every microsecond spent there is multiplied by thousands of invocations, becoming milliseconds of latency for the user. However, they weren't aware of the impact. Their naive implementation was then estimated to increase average request latency by 30x, bringing most of them to timeout.
I explained that it would be impossible to acquire stack traces and calculate the code owner at runtime as this kind of reflection is orders of magnitude slower than metric emission. The best I could think of would be to use a hash map of metrics to owners calculated once at boot time, but then there was no point in doing it at run time and I advised them to just post-process the results and touching this sensitive part of the stack.
However, it was still desirable to tag metrics at the source for integration with existing analysis tools, so they tried my hash map suggestion. We paired and I showed them how to do benchmarks, profiling and flame graphs. We identified expensive array operations and unnecessary allocations, and made the code much faster, but in the end just adding a mere literal tag was still too expensive.
I created local benchmark to compare the performance of dogstatsd-ruby emitting metrics with 1 and 2 tags. Then I tried to profile the benchmark but the output I got didn't show a clear difference between the two. I'm not 100% sure why, but I suspect it has to do with Vernier being a sampling profiler and the methods being so fast, in the order of 10 microseconds, so I also added another case with 20 tags to exacerbate differences:
#!/usr/bin/env ruby
require "datadog/statsd"
require "benchmark/ips"
require "vernier"
$dogstats = Datadog::Statsd.new('127.0.0.1', 8125)
def tags1
$dogstats.increment("metric", tags: {
k00: "v00",
})
end
def tags2
$dogstats.increment("metric", tags: {
k00: "v00", k01: "v01",
})
end
def tags20
$dogstats.increment("metric", tags: {
k00: "v00", k01: "v01", k02: "v02", k03: "v03", k04: "v04", k05: "v05",
k06: "v06", k07: "v07", k08: "v08", k09: "v09", k10: "v10", k11: "v11",
k12: "v12", k13: "v13", k14: "v14", k15: "v15", k16: "v16", k17: "v17",
k18: "v18", k19: "v19",
})
end
Vernier.profile(out: "dogtags.json") do
Benchmark.ips_quick(:tags1, :tags2, :tags20)
end
I set the CPU governor to performance to avoid scaling during:
cpupower frequency-set -g performance
and then ran the script, which gave me the following output:
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +PRISM [x86_64-linux]
Warming up --------------------------------------
tags1 10.294k i/100ms
tags2 9.298k i/100ms
tags20 3.450k i/100ms
Calculating -------------------------------------
tags1 102.344k (± 1.4%) i/s (9.77 μs/i) - 514.700k in 5.030109s
tags2 90.594k (± 2.0%) i/s (11.04 μs/i) - 455.602k in 5.031253s
tags20 33.369k (± 2.6%) i/s (29.97 μs/i) - 169.050k in 5.069859s
Comparison:
tags1: 102344.3 i/s
tags2: 90593.9 i/s - 1.13x slower
tags20: 33369.2 i/s - 3.07x slower
and the following flame graph:
which revealed that adding more tags increased time spent in serialisation more than in anything else, specially in the TagSerializer.
Indeed there was a lot going on with tag serialisation. Just escaping tags required string lookups and substitutions, being generally expensive and potentially allocating new objects:
def escape_tag_content(tag)
tag = tag.to_s
return tag unless tag.include?('|') || tag.include?(',')
tag.delete('|,')
end
Instead of wrapping dogstatsd-ruby and manipulating it from the outside, I moved to monkeypatching it from the inside. My solution was to pre-escape the map of metrics to owners at boot time, and override the StatSerializer#format method to inject them bypassing usual serialisation. Here's a proof of concept:
#!/usr/bin/env ruby
require "datadog/statsd"
require "benchmark/ips"
# Mapping of metrics to owners
$map ||= Hash.new("UNKNOWN")
$map["a.metric.for.test"] ||= "someonenotme"
$map["another.metric"] ||= "alsonotme"
# Pre-escape at boot as it's expensive at runtime
$map.transform_values do |tag|
tag.to_s.delete('|,')
end
# Injects metric attribution tag during serialisation so that it's fast and can
# be done async with the delay_serialization option
module FastAttributionTag
def format(name, delta, type, tags: [], sample_rate: 1)
name = formated_name(name)
if sample_rate != 1
if tags_list = tag_serializer.format(tags)
"#{@prefix_str}#{name}:#{delta}|#{type}|@#{sample_rate}|#owner:#{$map[name]},#{tags_list}"
else
"#{@prefix_str}#{name}:#{delta}|#{type}|@#{sample_rate}|#owner:#{$map[name]}"
end
else
if tags_list = tag_serializer.format(tags)
"#{@prefix_str}#{name}:#{delta}|#{type}|#owner:#{$map[name]},#{tags_list}"
else
"#{@prefix_str}#{name}:#{delta}|#{type}|#owner:#{$map[name]}"
end
end
end
end
$dogstats_standard = Datadog::Statsd.new("127.0.0.1", 8125)
$dogstats_attributed = Datadog::Statsd.new("127.0.0.1", 8125)
# Override method on instance by prepending module
$dogstats_attributed.instance_eval do
@serializer.instance_eval do
@stat_serializer.singleton_class.prepend(FastAttributionTag)
end
end
def standard
$dogstats_standard.increment("a.metric.for.test", tags: [])
end
def attributed
$dogstats_attributed.increment("a.metric.for.test", tags: [])
end
Benchmark.ips_quick(:standard, :attributed)
Its output suggested it could be fast enough, showing negligible performance impact:
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +PRISM [x86_64-linux]
Warming up --------------------------------------
standard 18.318k i/100ms
attributed 17.492k i/100ms
Calculating -------------------------------------
standard 174.778k (± 2.7%) i/s (5.72 μs/i) - 879.264k in 5.034907s
attributed 171.389k (± 2.4%) i/s (5.83 μs/i) - 857.108k in 5.003999s
Comparison:
standard: 174777.5 i/s
attributed: 171389.1 i/s - same-ish: difference falls within error
But I didn't deliver it without warning...
My proof of concept seemed to be sufficient, but was it really a good idea? DataDog's throughput was in the order of several GiB/s, meaning that any extra byte added to metrics could have dramatic effects on the network and receivers.
This journey brought up convincing evidence that such change was too risky. The team was then able to understand and articulate up that it was best to do it as a post-processing step, as previously suggested, and compromise on integration with analysis tools rather than compromise on site availability.
Thankfully, this code never saw production. :)