Monitoring Your App with Prometheus and Grafana: A Developer's Guide

You can't fix what you can't see. Most teams discover problems when customers report them — by then, the damage is done. Proper monitoring means you know about issues before users do, you have data to diagnose them quickly, and you can see trends before they become incidents.

Prometheus + Grafana is the most widely used open-source monitoring stack in 2026. Here's how to set it up for a Node.js application.

How It Works

Prometheus scrapes metrics from your application at regular intervals and stores them in a time-series database
Grafana connects to Prometheus and lets you visualize metrics in dashboards
Your app exposes a /metrics endpoint that Prometheus reads

The key insight: Prometheus pulls metrics from your app (instead of your app pushing to Prometheus). This means Prometheus controls the scrape interval and your app just needs to maintain an accurate metrics endpoint.

Instrumenting a Node.js App

npm install prom-client

// lib/metrics.ts
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client'
 
export const register = new Registry()
 
// Collect default Node.js metrics (CPU, memory, event loop, GC)
collectDefaultMetrics({ register })
 
// HTTP request counter
export const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register],
})
 
// HTTP request duration histogram
export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [register],
})
 
// Active connections gauge
export const activeConnections = new Gauge({
  name: 'active_connections',
  help: 'Number of active connections',
  registers: [register],
})
 
// Business metrics
export const userRegistrationsTotal = new Counter({
  name: 'user_registrations_total',
  help: 'Total number of user registrations',
  labelNames: ['method'],
  registers: [register],
})
 
export const ordersTotal = new Counter({
  name: 'orders_total',
  help: 'Total number of orders placed',
  labelNames: ['status'],
  registers: [register],
})

Metrics Middleware

// middleware/metrics.ts
import { Request, Response, NextFunction } from 'express'
import { httpRequestsTotal, httpRequestDuration } from '../lib/metrics'
 
export function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
  const start = Date.now()
 
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000
    
    // Normalize route to avoid high cardinality
    // /api/users/123 → /api/users/:id
    const route = req.route?.path || req.path
 
    const labels = {
      method: req.method,
      route,
      status_code: String(res.statusCode),
    }
 
    httpRequestsTotal.inc(labels)
    httpRequestDuration.observe(labels, duration)
  })
 
  next()
}

// app.ts — add the metrics endpoint and middleware
import { register } from './lib/metrics'
import { metricsMiddleware } from './middleware/metrics'
 
app.use(metricsMiddleware)
 
// Metrics endpoint — only accessible internally
app.get('/metrics', async (req, res) => {
  // In production, add IP allowlist or internal-only network rule
  res.set('Content-Type', register.contentType)
  res.send(await register.metrics())
})

Now visit http://localhost:3000/metrics and you'll see raw Prometheus metrics.

Track Business Metrics

Technical metrics (CPU, memory, request rate) tell you the system is struggling. Business metrics tell you what's actually happening to your users:

// In your auth controller
import { userRegistrationsTotal } from '../lib/metrics'
 
export async function register(req, res) {
  // ... registration logic
  userRegistrationsTotal.inc({ method: 'email' })
  // or { method: 'google' } for OAuth
}
 
// In your orders controller
import { ordersTotal } from '../lib/metrics'
 
export async function createOrder(req, res) {
  try {
    const order = await processOrder(req.body)
    ordersTotal.inc({ status: 'success' })
    res.json(order)
  } catch (error) {
    ordersTotal.inc({ status: 'failed' })
    throw error
  }
}

Docker Compose Setup

# docker-compose.yml
version: '3.8'
 
services:
  app:
    build: .
    ports:
      - "3000:3000"
    labels:
      - "prometheus.scrape=true"
      - "prometheus.port=3000"
      - "prometheus.path=/metrics"
 
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'
 
  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3001:3000"
    depends_on:
      - prometheus
 
volumes:
  prometheus_data:
  grafana_data:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
 
scrape_configs:
  - job_name: 'nodejs-app'
    static_configs:
      - targets: ['app:3000']
    metrics_path: '/metrics'

Essential PromQL Queries

Once Prometheus is scraping your app, these queries cover most use cases:

# Request rate per second (last 5 minutes)
rate(http_requests_total[5m])
 
# Error rate (4xx + 5xx)
rate(http_requests_total{status_code=~"[45].."}[5m])
/ rate(http_requests_total[5m]) * 100
 
# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
 
# Requests by route
topk(10, rate(http_requests_total[5m]))
 
# Memory usage
process_resident_memory_bytes / 1024 / 1024
 
# Event loop lag (Node.js performance indicator)
nodejs_eventloop_lag_seconds

Grafana Dashboard Setup

Open Grafana at http://localhost:3001 (admin/changeme)
Add Prometheus data source: Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090
Create a new dashboard and add panels with the PromQL queries above

A basic dashboard should show:

Request rate (line chart, all routes)
Error rate (line chart, should be near 0)
P95 latency (line chart)
Active connections (gauge)
Memory usage (line chart)

Alerting

# alerting_rules.yml
groups:
  - name: app_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status_code=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5%"
          description: "Error rate is {{ $value | humanizePercentage }}"
 
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency above 1 second"

Key Takeaways

Prometheus pulls metrics from a /metrics endpoint your app exposes — instrument your app with prom-client
Track business metrics alongside technical ones — error rates and latency tell you the system; conversion rates tell you the product
High-cardinality labels (user IDs, order IDs) kill Prometheus — use normalized route patterns
Set up alerting rules so Prometheus fires alerts before customers notice
The event loop lag metric is the single best indicator of a Node.js performance problem

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

Prometheus + Grafana is the most widely used open-source monitoring stack in 2026. Here's how to set it up for a Node.js application.

How It Works

Prometheus scrapes metrics from your application at regular intervals and stores them in a time-series database
Grafana connects to Prometheus and lets you visualize metrics in dashboards
Your app exposes a /metrics endpoint that Prometheus reads

Instrumenting a Node.js App

npm install prom-client

// lib/metrics.ts
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client'
 
export const register = new Registry()
 
// Collect default Node.js metrics (CPU, memory, event loop, GC)
collectDefaultMetrics({ register })
 
// HTTP request counter
export const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register],
})
 
// HTTP request duration histogram
export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [register],
})
 
// Active connections gauge
export const activeConnections = new Gauge({
  name: 'active_connections',
  help: 'Number of active connections',
  registers: [register],
})
 
// Business metrics
export const userRegistrationsTotal = new Counter({
  name: 'user_registrations_total',
  help: 'Total number of user registrations',
  labelNames: ['method'],
  registers: [register],
})
 
export const ordersTotal = new Counter({
  name: 'orders_total',
  help: 'Total number of orders placed',
  labelNames: ['status'],
  registers: [register],
})

Metrics Middleware

// middleware/metrics.ts
import { Request, Response, NextFunction } from 'express'
import { httpRequestsTotal, httpRequestDuration } from '../lib/metrics'
 
export function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
  const start = Date.now()
 
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000
    
    // Normalize route to avoid high cardinality
    // /api/users/123 → /api/users/:id
    const route = req.route?.path || req.path
 
    const labels = {
      method: req.method,
      route,
      status_code: String(res.statusCode),
    }
 
    httpRequestsTotal.inc(labels)
    httpRequestDuration.observe(labels, duration)
  })
 
  next()
}

// app.ts — add the metrics endpoint and middleware
import { register } from './lib/metrics'
import { metricsMiddleware } from './middleware/metrics'
 
app.use(metricsMiddleware)
 
// Metrics endpoint — only accessible internally
app.get('/metrics', async (req, res) => {
  // In production, add IP allowlist or internal-only network rule
  res.set('Content-Type', register.contentType)
  res.send(await register.metrics())
})

Now visit http://localhost:3000/metrics and you'll see raw Prometheus metrics.

Track Business Metrics

Technical metrics (CPU, memory, request rate) tell you the system is struggling. Business metrics tell you what's actually happening to your users:

// In your auth controller
import { userRegistrationsTotal } from '../lib/metrics'
 
export async function register(req, res) {
  // ... registration logic
  userRegistrationsTotal.inc({ method: 'email' })
  // or { method: 'google' } for OAuth
}
 
// In your orders controller
import { ordersTotal } from '../lib/metrics'
 
export async function createOrder(req, res) {
  try {
    const order = await processOrder(req.body)
    ordersTotal.inc({ status: 'success' })
    res.json(order)
  } catch (error) {
    ordersTotal.inc({ status: 'failed' })
    throw error
  }
}

Docker Compose Setup

# docker-compose.yml
version: '3.8'
 
services:
  app:
    build: .
    ports:
      - "3000:3000"
    labels:
      - "prometheus.scrape=true"
      - "prometheus.port=3000"
      - "prometheus.path=/metrics"
 
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'
 
  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3001:3000"
    depends_on:
      - prometheus
 
volumes:
  prometheus_data:
  grafana_data:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
 
scrape_configs:
  - job_name: 'nodejs-app'
    static_configs:
      - targets: ['app:3000']
    metrics_path: '/metrics'

Essential PromQL Queries

Once Prometheus is scraping your app, these queries cover most use cases:

# Request rate per second (last 5 minutes)
rate(http_requests_total[5m])
 
# Error rate (4xx + 5xx)
rate(http_requests_total{status_code=~"[45].."}[5m])
/ rate(http_requests_total[5m]) * 100
 
# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
 
# Requests by route
topk(10, rate(http_requests_total[5m]))
 
# Memory usage
process_resident_memory_bytes / 1024 / 1024
 
# Event loop lag (Node.js performance indicator)
nodejs_eventloop_lag_seconds

Grafana Dashboard Setup

Open Grafana at http://localhost:3001 (admin/changeme)
Add Prometheus data source: Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090
Create a new dashboard and add panels with the PromQL queries above

A basic dashboard should show:

Request rate (line chart, all routes)
Error rate (line chart, should be near 0)
P95 latency (line chart)
Active connections (gauge)
Memory usage (line chart)

Alerting

# alerting_rules.yml
groups:
  - name: app_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status_code=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5%"
          description: "Error rate is {{ $value | humanizePercentage }}"
 
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency above 1 second"

Key Takeaways

Prometheus pulls metrics from a /metrics endpoint your app exposes — instrument your app with prom-client
Track business metrics alongside technical ones — error rates and latency tell you the system; conversion rates tell you the product
High-cardinality labels (user IDs, order IDs) kill Prometheus — use normalized route patterns
Set up alerting rules so Prometheus fires alerts before customers notice
The event loop lag metric is the single best indicator of a Node.js performance problem

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

Sabir Lkhaloufi

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

How It Works

Instrumenting a Node.js App

Metrics Middleware

Track Business Metrics

Docker Compose Setup

Essential PromQL Queries

Grafana Dashboard Setup

Alerting

Key Takeaways

Popular Blogs

Claude AI vs ChatGPT: An Honest Comparison for Developers

AI Tools Every Developer Should Be Using in 2026

Using the Claude API in Real Projects: A Practical Developer Guide

Prompt Engineering for Developers: Write Prompts That Actually Work

Categories

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

Sabir Lkhaloufi

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

How It Works

Instrumenting a Node.js App

Metrics Middleware

Track Business Metrics

Docker Compose Setup

Essential PromQL Queries

Grafana Dashboard Setup

Alerting

Key Takeaways

Popular Blogs

Claude AI vs ChatGPT: An Honest Comparison for Developers

AI Tools Every Developer Should Be Using in 2026

Using the Claude API in Real Projects: A Practical Developer Guide

Prompt Engineering for Developers: Write Prompts That Actually Work

Categories

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

Popular Blogs

Categories

Related Posts

Monitoring Your App with Prometheus and Grafana: A Developer's Guide

Popular Blogs

Categories

Related Posts