--- title: "Monitoring & alarms" --- We monitor everything with Prometheus, and configure alarms with alertmanager. # Public metrics Our Prometheus instance is publically available at [metrics.sr.ht](https://metrics.sr.ht). ## Areas for improvement 1. We should make dashboards. It would be pretty to look at and could be a useful tool for root cause analysis. Note that some users who have their own Grafana instance have pointed it at our public Prometheus data and made some simple dashboards - I would be open to having community ownership over this. # Pushgateway A pushgateway is running at push.metrics.sr.ht. It's firewalled to only accept connections from [our subnet](/ops/topology.md). # Aggregation gateway [prom-aggregation-gateway](https://github.com/weaveworks/prom-aggregation-gateway) is running at aggr.metrics.sr.ht. It's firewalled to only accept connections from [our subnet](/ops/topology.md). # Alertmanager We use alertmanager to forward [alerts](https://metrics.sr.ht/alerts) to various sinks. - **interesting** alerts are forwarded to the IRC channel, #sr.ht.ops - **important** alerts are sent the ops mailing list, and the IRC channel - **urgent** alerts page Drew's phone, are sent to the mailing list, and the IRC channel Some security-related alarms are sent directly to Drew and are not made public. Our alerts are configured here: https://git.sr.ht/~sircmpwn/metrics.sr.ht # Areas for improvement 1. Would be nice to have centralized logging. There is sensitive information in some of our logs, so this probably can't be made public.