diff options
Diffstat (limited to 'ops/index.md')
-rw-r--r-- | ops/index.md | 48 |
1 files changed, 48 insertions, 0 deletions
diff --git a/ops/index.md b/ops/index.md new file mode 100644 index 0000000..e0814b0 --- /dev/null +++ b/ops/index.md @@ -0,0 +1,48 @@ +--- +title: SourceHut operational manual +--- + +This subset of the manual documents our approach to the operations and +maintenance of the hosted service, sr.ht. You may find this useful for running +your own hosted sr.ht service, or to evaluate our practices & policies to +consider if they meet your requirements for availability or robustness. You also +might just find this stuff interesting, as SourceHut is one of the few largeish +services which is not hosted in The Cloud™. + +- [Backups & redundancy](/ops/backups.md) +- [Emergency planning](/ops/emergency-planning.md) +- [High availability](/ops/availability.md) +- [Monitoring & alarms](/ops/monitoring.md) +- [Network topology](/ops/topology.md) +- [Provisioning & allocation](/ops/provisioning.md) + +# Operational Resources + +## Status page + +[status.sr.ht](https://status.sr.ht) is hosted on third-party infrastructure and +is used to communicate about upcoming planned outages, and to provide updates +during incident resolution. Planned outages are also posted to +[sr.ht-announce](https://lists.sr.ht/~sircmpwn/sr.ht-announce) in advance. + +The status page is updated by a human being, who is probably busy fixing the +problem. You may want to check the next resource as well: + +## Monitoring & alarms + +Our Prometheus instance at [metrics.sr.ht](https://metrics.sr.ht) is available +to the public for querying our monitoring systems and viewing the state of +various alarms. + +## Mailing list + +The [sr.ht-ops](https://lists.sr.ht/~sircmpwn/sr.ht-ops) mailing list is used +for automated reports from our services, including alarm notifications of +"important" or "urgent" severity, and automated reports on operational status of +backups and other systems. + +## IRC channel + +The `#sr.ht.ops` IRC channel on irc.freenode.net is used for triage and +coordination during outages, and has a real-time feed of alarms raised by our +monitoring system. |