OpenTelemetry Is Broken

Jan 11, 2026

OpenTelemetry set out to solve a real problem: fragmented observability standards, vendor-specific instrumentation, and painful migrations. The promise was compelling—instrument once, send anywhere.

But after years of adoption, production usage, and real-world pain, it’s time to say this clearly:

OpenTelemetry, as it exists today, is broken.

Not “unfinished.” Not “needs polish.” Broken in ways that actively hurt teams trying to build reliable systems.

This isn’t an attack on the people behind the project—many are excellent engineers with good intentions. This is a critique of the outcomes. And the outcomes matter more than the vision.

Below are the core reasons why.

1. No UI or Storage Layer = More Vendor Lock-In, Not Less

OpenTelemetry deliberately stops at telemetry generation and transport. It does not define:

A storage format
A query language
A UI
A canonical data model at rest

In theory, this keeps OpenTelemetry “neutral.”
In practice, it pushes the most important decisions downstream to vendors.

Once your data hits a backend:

It is reshaped, sampled, dropped, aggregated, or enriched in vendor-specific ways
Queries become vendor-specific
Dashboards become vendor-specific
Retention and cost models diverge dramatically

So yes, you can switch exporters more easily—but you cannot switch observability systems without re-learning how your data behaves.

This is not reduced lock-in.
It’s lock-in with an extra hop.

Contrast this with databases or logging systems that define:

Storage semantics
Query primitives
Clear mental models

OpenTelemetry defines none of these—and then acts surprised when users feel trapped.

2. The Artificial Split Between Traces, Metrics, and Logs Is Actively Harmful

OpenTelemetry doubles down on the classic “three pillars of observability”:

Traces
Metrics
Logs

This distinction is conceptual, historical, and outdated.

In real systems:

A log line can be a metric
A metric can be derived from traces
A trace span often exists only to annotate logs
Users think in events, state, and relationships, not pillars

Yet OpenTelemetry:

Forces different SDKs and APIs
Encourages different storage paths
Requires users to decide in advance what category data belongs to

This leads to absurd questions like:

“Should this be a metric or a span attribute?”
“Should I emit a log or add an event?”

These are not observability questions.
They are framework-induced confusion.

A good observability system lets you record facts first and decide how to aggregate, query, and visualize later.

OpenTelemetry does the opposite.

3. Traces Are Overused, Noisy, and Low-Value for Most Workloads

Distributed tracing is powerful—for a small subset of problems:

High-latency request paths
Complex service-to-service interactions
Rare edge cases

OpenTelemetry treats tracing as the default signal.
This is a mistake.

In most production systems:

99.9% of traces are boring
Sampling drops the interesting ones
Unsampled traces waste CPU, memory, and bandwidth
Engineers stop looking at traces entirely

Instead of helping, tracing becomes:

Expensive noise
A false sense of coverage
A debugging crutch that rarely pays off

What engineers actually use day-to-day:

Aggregated metrics
Structured logs
High-level trends and anomalies

Yet OpenTelemetry’s design nudges teams toward:

“Just add more spans”

This is not observability.
It’s instrumentation theater.

4. Clients Are Heavy, Over-Engineered, and Painful to Use

OpenTelemetry SDKs are:

Large
Complex
Opinionated
Inconsistent across languages
Frequently changing

Common problems teams hit:

Huge dependency graphs
Subtle performance regressions
Confusing lifecycle management
Breaking changes hidden behind “semantic conventions”
Instrumentation that’s harder to remove than to add

Instead of being a thin abstraction, OpenTelemetry clients often feel like:

“An observability framework embedded inside my app”

This is especially damaging for:

High-throughput systems
Resource-constrained workloads
Teams that value simplicity and control

Observability tooling should be boring, predictable, and cheap.

OpenTelemetry SDKs are none of these.

The Core Issue: OpenTelemetry Optimizes for Standards, Not for Users

OpenTelemetry is very good at:

Committees
Specifications
Abstractions
Compatibility matrices

It is much worse at:

Developer experience
Debuggability
Clear mental models
Practical defaults

The project assumes that:

“If we standardize the pipes, everything else will sort itself out.”

It hasn’t.

What we got instead is:

More complexity
More vendors
More cognitive load
And observability that feels harder than before

What Should Replace This?

That’s a longer conversation—but a few principles are clear:

Start from how engineers debug, not from signals
Treat telemetry as events and relationships, not pillars
Make storage, querying, and UI first-class concepts
Keep clients minimal and dumb
Optimize for signal density, not volume

OpenTelemetry tried to be neutral.
Neutrality turned out to be a design choice—with consequences.

And those consequences are showing up in production systems everywhere.

Moshe's Substack

Discussion about this post

Ready for more?