Skip to content
Guilherme Nogueira
Go back

Observability is more than dashboards

A lot of teams believe they have observability because they have dashboards. They don’t. They have decoration.

This is the first post on my blog, so let me start with an opinion I keep repeating to teams I work with: observability is not about how many graphs you have on a screen — it’s about whether you can answer a question you didn’t know you’d need to ask.

Table of contents

Open Table of contents

The dashboard trap

Dashboards are great at answering questions you already thought of. CPU is high? There’s a graph. Latency spiked? There’s a graph. Someone built those panels because, at some point, that metric mattered.

The problem is that real incidents almost never look like the dashboards you built in advance. The interesting failures are the ones nobody predicted — a downstream dependency degrading in a way that only shows up as a weird tail latency, a retry storm hidden inside a “healthy” success rate, a single tenant quietly poisoning a shared queue.

When that happens, a wall of pre-built dashboards doesn’t help. You end up staring at green panels while users are clearly suffering. That gap — green dashboards, unhappy users — is the exact moment you discover whether you have observability or just monitoring.

Monitoring vs. observability

The distinction is simple, even if the tooling industry loves to blur it:

You can’t pre-build a dashboard for every version of that question. What you can do is emit data rich enough — high-cardinality, well-structured, correlated — that you can slice it any way you need when the time comes.

What actually makes a system observable

In my experience, three things matter far more than the number of dashboards:

  1. High-cardinality, structured events. Wide events with lots of dimensions (customer ID, region, version, route, dependency) beat a pile of low-cardinality counters. You want to be able to group by something you never anticipated.
  2. Correlation across signals. A trace that links to its logs, which link to the metrics for that service, which tie back to a deploy. If you’re manually copy-pasting timestamps between three tools at 3 AM, you don’t have observability — you have homework.
  3. Questions over dashboards. The real test: can an on-call engineer who has never seen this failure before explore their way to the root cause? If the answer depends on someone having built the right panel last quarter, you’re betting your reliability on luck.

A simple test for your team

Next time you’re in an incident review, ask one question:

“Could we have answered this with the data we already had — without deploying anything new?”

If the answer is “no, we had to add logging and wait for it to happen again,” that’s not a tooling failure. That’s an observability gap. And no amount of extra dashboards will close it.

Closing thought

Dashboards are the output. They’re useful — I build plenty of them. But they’re the answer to yesterday’s questions. Observability is the capability to answer tomorrow’s.

Build for the questions you can’t predict yet. Your future self, three coffees deep into a 3 AM incident, will thank you.


This is the first post on this blog. I’ll be writing about Cloud, SRE, DevOps, AWS, Kubernetes, Terraform and the real-world lessons I run into while building and operating infrastructure. The opinions here are my own.


Share this post: