blob: 158b23baf9bdf33a17da6776c515587fceeea872 (
plain) (
tree)
|
|
---
title: "Monitoring & alarms"
---
We monitor everything with Prometheus, and configure alarms with alertmanager.
# Public metrics
Our Prometheus instance is publically available at
[metrics.sr.ht](https://metrics.sr.ht).
## Areas for improvement
1. We should make dashboards. It would be pretty to look at and could be a
useful tool for root cause analysis. Note that some users who have their own
Grafana instance have pointed it at our public Prometheus data and made some
simple dashboards - I would be open to having community ownership over this.
# Pushgateway
A pushgateway is running at push.metrics.sr.ht. It's firewalled to only accept
connections from [our subnet](/ops/topology.md).
# Aggregation gateway
[prom-aggregation-gateway](https://github.com/weaveworks/prom-aggregation-gateway)
is running at aggr.metrics.sr.ht. It's firewalled to only accept connections
from [our subnet](/ops/topology.md).
# Alertmanager
We use alertmanager to forward [alerts](https://metrics.sr.ht/alerts) to various
sinks.
- **interesting** alerts are forwarded to the IRC channel, #sr.ht.ops
- **important** alerts are sent the ops mailing list, and the IRC channel
- **urgent** alerts page Drew's phone, are sent to the mailing list, and the IRC
channel
Some security-related alarms are sent directly to Drew and are not made public.
Our alerts are configured here:
https://git.sr.ht/~sircmpwn/metrics.sr.ht
# Areas for improvement
1. Would be nice to have centralized logging. There is sensitive information in
some of our logs, so this probably can't be made public.
|