Monitoring and Alerting

This document provides information about the monitoring and alerting setup for APIFromAnything.

Overview

APIFromAnything includes a comprehensive monitoring and alerting system that helps you track the performance and health of your API. The system is built on the following components:

  • Prometheus: For metrics collection and storage

  • Grafana: For visualization and dashboards

  • AlertManager: For alert management and notifications

Metrics

APIFromAnything exposes various metrics that can be collected by Prometheus:

Request Metrics

  • apifrom_request_count: Count of requests received, labeled by method, endpoint, and status code

  • apifrom_request_latency_seconds: Histogram of request latency in seconds, labeled by method and endpoint

  • apifrom_requests_in_progress: Gauge of requests currently being processed, labeled by method and endpoint

  • apifrom_error_count: Count of errors occurred, labeled by method, endpoint, and exception type

Database Metrics

  • apifrom_db_query_latency_seconds: Histogram of database query latency in seconds, labeled by operation and table

Cache Metrics

  • apifrom_cache_hit_count: Count of cache hits, labeled by cache name

  • apifrom_cache_miss_count: Count of cache misses, labeled by cache name

System Metrics

  • apifrom_system_info: System information, labeled by version and Python version

Monitoring Setup

Docker Compose

The monitoring stack is included in the docker-compose.yml file and consists of the following services:

  • prometheus: Collects and stores metrics

  • grafana: Visualizes metrics and provides dashboards

  • alertmanager: Manages alerts and notifications

Configuration

Prometheus

Prometheus is configured in monitoring/prometheus/prometheus.yml. The configuration includes:

  • Scrape configurations for various services

  • Alert rules

  • AlertManager configuration

Grafana

Grafana is configured with:

  • Datasources in monitoring/grafana/provisioning/datasources/prometheus.yml

  • Dashboards in monitoring/grafana/provisioning/dashboards/

AlertManager

AlertManager is configured in monitoring/alertmanager/alertmanager.yml. The configuration includes:

  • Notification receivers (email, Slack, PagerDuty)

  • Routing configuration

  • Inhibition rules

Dashboards

APIFromAnything comes with a pre-configured Grafana dashboard that provides insights into the performance and health of your API. The dashboard includes the following panels:

  • Request Rate

  • Request Latency

  • Error Rate

  • Requests In Progress

  • Database Query Latency

  • Cache Hit/Miss Rate

Alerts

APIFromAnything includes pre-configured alerts that notify you when certain conditions are met. The alerts include:

  • HighRequestLatency: Triggered when the 95th percentile of request latency is above 1s for an endpoint

  • HighErrorRate: Triggered when the error rate is above 5% for an endpoint

  • CriticalErrorRate: Triggered when the error rate is above 20% for an endpoint

  • HighRequestRate: Triggered when the request rate is above 100 requests per second for an endpoint

  • HighDatabaseLatency: Triggered when the 95th percentile of database query latency is above 0.5s

  • HighCacheMissRate: Triggered when the cache miss rate is above 80%

  • InstanceDown: Triggered when an instance is down for more than 1 minute

  • HighMemoryUsage: Triggered when memory usage is above 1GB

Integration with Application Code

Middleware

APIFromAnything includes a Prometheus middleware that automatically collects request metrics. The middleware is configured in apifrom/monitoring.py.

Custom Metrics

You can add custom metrics to your application by using the apifrom.monitoring module. For example:

from apifrom.monitoring import DatabaseMetrics, CacheMetrics

# Track database query time
async with DatabaseMetrics.track_query_time("select", "users"):
    result = await db.fetch_all("SELECT * FROM users")

# Record cache hit/miss
if result_from_cache:
    CacheMetrics.record_hit("user_cache")
else:
    CacheMetrics.record_miss("user_cache")

Accessing Monitoring Tools

When running with Docker Compose, the monitoring tools are available at the following URLs:

  • Prometheus: http://localhost:9090

  • Grafana: http://localhost:3000 (default credentials: admin/admin)

  • AlertManager: http://localhost:9093

Customizing Monitoring

Adding Custom Metrics

You can add custom metrics by modifying the apifrom/monitoring.py file. For example, to add a new counter:

from prometheus_client import Counter

MY_COUNTER = Counter(
    'apifrom_my_counter',
    'Description of my counter',
    ['label1', 'label2'],
    registry=registry
)

# Increment the counter
MY_COUNTER.labels(label1='value1', label2='value2').inc()

Adding Custom Dashboards

You can add custom Grafana dashboards by adding JSON files to the monitoring/grafana/provisioning/dashboards/ directory.

Customizing Alerts

You can customize alerts by modifying the monitoring/prometheus/rules/alerts.yml file.

Best Practices

  • Monitor both application-level metrics (request rate, latency) and system-level metrics (CPU, memory)

  • Set up alerts for critical conditions that require immediate attention

  • Use dashboards to visualize trends and identify potential issues before they become critical

  • Regularly review and adjust alert thresholds based on your application’s performance characteristics

  • Implement proper logging alongside metrics for better debugging capabilities