Accelerate Database Performance Troubleshooting with Grafana Assistant – A Step-by-Step Guide
Introduction
Your database is slowing down. You see a query's P99 latency spike, wait events like wait/synch/mutex/innodb flaring up, and errors climbing—but what do you actually do about it? Visibility alone isn't enough. You need actionable diagnosis and clear next steps. That's where the Grafana Assistant integration for Database Observability comes in. It combines the power of generative AI with the depth of Grafana Cloud’s observability data, giving you purpose‑built analysis without the hassle of copying SQL into separate tools. The assistant already knows your real Prometheus and Loki data, table schemas, indexes, and execution plans—all within your current time window. This guide walks you through using the assistant to diagnose and resolve common database performance issues step by step.
What You Need
- A Grafana Cloud account with Database Observability enabled.
- Access to the relevant Prometheus and Loki data sources for your database.
- An identified slow or degraded query (shown in the overview with increased duration, error rate, or wait events).
- Familiarity with basic SQL query concepts (joins, scans, indexes, wait events) – though the assistant will explain them for you.
Step‑by‑Step Instructions
Step 1: Locate the Offending Query in the Overview
Open the Database Observability dashboard in Grafana Cloud. In the query overview, look for a query where the duration is spiking or the error rate is climbing. These are your prime candidates. The assistant works best when you have a specific query in mind—don’t worry if you’re not yet sure why it’s slow; you’ll find out in the next steps.
Step 2: Dive into the Query Detail View
Click on the query to open its detailed view. You’ll see time‑series data for RED metrics (Rate, Errors, Duration), individual execution samples, wait event breakdowns, table schemas, and visual explain plans. Take a moment to note the time window you’re looking at – the assistant will use this exact window for its analysis.
Step 3: Launch the Grafana Assistant with a Guided Prompt
In the query detail view, locate the Assistant panel. Instead of typing a generic question, click the pre‑built button labeled “Why is this query slow?”. This button sends a purpose‑built prompt designed by database engineers—not a generic AI prompt. The assistant immediately begins querying your real Prometheus and Loki data for the selected time range. It also fetches the actual table schemas, indexes, and execution plans that are already loaded in your session. You don’t need to paste any SQL or describe the schema manually.
Step 4: Review the Assistant’s Health Assessment
After a few seconds, the assistant outputs a synthesized health assessment. It combines information from both data sources into plain‑language insights. For example:
- Duration is spiking because the number of rows examined is 50 times the number of rows returned – meaning most work is wasted on filtering.
- The P99 is 12x the median, indicating an intermittent problem rather than a constant one.
- CPU time is healthy, but wait events consume 40% of execution time – pointing to contention or I/O bottlenecks.
The assistant also translates cryptic wait event names like wait/synch/mutex/innodb into plain English, e.g., “During this wait, the database is …” and suggests possible causes.
Step 5: Interpret the Assistant’s Recommendations
Based on the assessment, the assistant provides specific, actionable advice. For the example above, it may recommend:
- Reviewing the query’s WHERE clause to improve filtering selectivity.
- Adding or modifying an index to reduce the number of rows examined.
- Examining the execution plan for a table scan that wasn’t a problem until data grew.
- Investigating the wait event – for instance, mutex contention might suggest a hot row or table lock.
Each recommendation is backed by real data from your database, not generic advice.
Step 6: Apply the Change and Validate
Implement the assistant’s suggestion (e.g., create an index, rewrite the query, adjust schema). Then return to the same time window or a later one to confirm improvement. The assistant can be re‑run to see if the health assessment changes. Alternatively, you can use other pre‑built buttons like “Get recommendations on changes” to explore additional optimizations.
Step 7: Repeat for Other Common Issues
The assistant offers several out‑of‑the‑box prompts beyond “Why is this query slow?”. For example:
- “Why is this query degraded?” – focuses on regressions or resource contention.
- “Recommend schema changes” – suggests index additions or table alterations.
- “Explain this wait event” – dives deeper into a specific bottleneck.
You can also still use the free‑form chat to ask your own questions, but the guided prompts are designed to cover the most common troubleshooting scenarios.
Tips for Maximum Effectiveness
- Use the assistant early – before diving into manual analysis. It saves minutes of context‑gathering and eliminates guesswork.
- Trust the data – the assistant only uses your actual Prometheus and Loki data; it never stores query text or schema metadata for model training. Your information stays private.
- Combine with visual explain plans – the assistant’s text output complements the visual plans you already see. Use both to form a complete picture.
- Experiment with different time windows – sometimes a problem only appears during peak hours. Adjust the time range in the dashboard and re‑run the assistant.
- Free prompt when needed – if the guided prompts don’t cover a niche scenario, type your own question in the assistant chat. It still works against your real data.
- Rerun after changes – after applying a fix, launch the same guided prompt again to verify the health assessment improves. This closes the loop.
Related Articles
- 7 Key Facts About Kubernetes v1.36's Pod-Level Resource Managers (Alpha)
- How Azure Local Enables Sovereign Private Cloud at Massive Scale
- Mastering Top announcements of the What’s Next with AWS, 2026
- New Amazon ECS Feature: Independent Daemon Management for Managed Instances
- Securing ClickHouse Deployments: How Docker Hardened Images Overcome CVE Blocks
- Navigating AI Trust in Financial Services: Highlights from the AWS Financial Services Symposium
- 10 Ways Grafana Assistant Speeds Up Database Performance Troubleshooting
- CSS & Web Platform Q&A: Clip-Path Puzzles, View Transitions, Scoping, and More