aboutsummaryrefslogtreecommitdiffstats
path: root/ops/index.md
diff options
context:
space:
mode:
authorDrew DeVault <sir@cmpwn.com>2020-03-05 17:06:47 -0500
committerDrew DeVault <sir@cmpwn.com>2020-03-05 17:06:47 -0500
commit747994f68fa7eb21a68e2aa7e04085369b9c92ab (patch)
treeef93615cd60cd126714c150833b0fafbac933af7 /ops/index.md
parent8a8161fd6aec91dae49b57bb3337c0f5dafb1590 (diff)
downloadsr.ht-docs-747994f68fa7eb21a68e2aa7e04085369b9c92ab.tar.gz
Add operational documentation
Diffstat (limited to 'ops/index.md')
-rw-r--r--ops/index.md48
1 files changed, 48 insertions, 0 deletions
diff --git a/ops/index.md b/ops/index.md
new file mode 100644
index 0000000..e0814b0
--- /dev/null
+++ b/ops/index.md
@@ -0,0 +1,48 @@
+---
+title: SourceHut operational manual
+---
+
+This subset of the manual documents our approach to the operations and
+maintenance of the hosted service, sr.ht. You may find this useful for running
+your own hosted sr.ht service, or to evaluate our practices & policies to
+consider if they meet your requirements for availability or robustness. You also
+might just find this stuff interesting, as SourceHut is one of the few largeish
+services which is not hosted in The Cloud™.
+
+- [Backups & redundancy](/ops/backups.md)
+- [Emergency planning](/ops/emergency-planning.md)
+- [High availability](/ops/availability.md)
+- [Monitoring & alarms](/ops/monitoring.md)
+- [Network topology](/ops/topology.md)
+- [Provisioning & allocation](/ops/provisioning.md)
+
+# Operational Resources
+
+## Status page
+
+[status.sr.ht](https://status.sr.ht) is hosted on third-party infrastructure and
+is used to communicate about upcoming planned outages, and to provide updates
+during incident resolution. Planned outages are also posted to
+[sr.ht-announce](https://lists.sr.ht/~sircmpwn/sr.ht-announce) in advance.
+
+The status page is updated by a human being, who is probably busy fixing the
+problem. You may want to check the next resource as well:
+
+## Monitoring & alarms
+
+Our Prometheus instance at [metrics.sr.ht](https://metrics.sr.ht) is available
+to the public for querying our monitoring systems and viewing the state of
+various alarms.
+
+## Mailing list
+
+The [sr.ht-ops](https://lists.sr.ht/~sircmpwn/sr.ht-ops) mailing list is used
+for automated reports from our services, including alarm notifications of
+"important" or "urgent" severity, and automated reports on operational status of
+backups and other systems.
+
+## IRC channel
+
+The `#sr.ht.ops` IRC channel on irc.freenode.net is used for triage and
+coordination during outages, and has a real-time feed of alarms raised by our
+monitoring system.