blob: 8165415eae555be1f461ac4ee6c0f44c10040b6b (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
---
title: SourceHut operational manual
---
This subset of the manual documents our approach to the operations and
maintenance of the hosted service, sr.ht. You may find this useful for running
your own hosted sr.ht service, or to evaluate our practices & policies to
consider if they meet your requirements for availability or robustness. You also
might just find this stuff interesting, as SourceHut is one of the few largeish
services which is not hosted in The Cloud™.
Additional resources:
- [New sysadmin lecture](/ops/new-sysadmin.md)
- [Backups & redundancy](/ops/backups.md)
- [Emergency planning](/ops/emergency-planning.md)
- [High availability](/ops/availability.md)
- [Monitoring & alarms](/ops/monitoring.md)
- [Outage incident response](/ops/incident.md)
- [Network topology](/ops/topology.md)
- [Provisioning & allocation](/ops/provisioning.md)
- [PostgreSQL robustness planning](/ops/robust-psql.md)
- [SourceHut scalability plans](/ops/scale.md)
- [Security incident reports](/ops/security-incidents)
- [Outage incident reports](/ops/outages)
Next available port number: 5016/5116
# Publically available operational resources
We try to make as much of our operations available to the public as possible.
## Status page
[status.sr.ht](https://status.sr.ht) is hosted on third-party infrastructure and
is used to communicate about upcoming planned outages, and to provide updates
during incident resolution. Planned outages are also posted to
[sr.ht-announce](https://lists.sr.ht/~sircmpwn/sr.ht-announce) in advance.
The status page is updated by a human being, who is probably busy fixing the
problem.
## Monitoring & alarms
Our Prometheus instance at [metrics.sr.ht](https://metrics.sr.ht) is available
to the public for querying our monitoring systems and viewing the state of
various alarms. Some alarms are also fed to the IRC channel and mailing list.
## Mailing list
The [sr.ht-ops](https://lists.sr.ht/~sircmpwn/sr.ht-ops) mailing list is used
for automated reports from our services, including alarm notifications of
"important" or "urgent" severity, and automated reports on operational status of
backups and other systems.
## IRC channel
The `#sr.ht.ops` IRC channel on irc.libera.chat is used for triage and
coordination during outages, and has a real-time feed of alarms raised by our
monitoring system.
|