SRE: SLA vs SLO vs SLI
SLA — Service Level Agreement. It’s more about contracts.
SLI — Service Level Indicator. It’s some parameters that should be measured and which should be kept in some range.
SLO — Service Level Objective. It says how often SLI could fail. Like SLI should be true for 99.9% of the time.
Video from Google’s engineer (in Russian) — https://youtu.be/avdS6aIs4yI
Another look on that:
An SLI is a service level indicator; a carefully defined quantitative measure of some aspect of the level of service that is provided. e.g Request Latency, Error Rate.
An SLO is a service level objective; a target value or range of values for a service level that is measured by an SLI. e.g. 95% of /homescreen calls will complete in less than 100 ms.
SLAs are service level agreements; an explicit or implicit contract with stakeholders that includes consequences of meeting (or missing) the SLOs they contain.
SLO is an SLA for the team
SLO API endpoint should answer requests in less than 50ms 99% of the time (weekly)
SLI for this — endpoint response time, p99 for the last week
SLO API endpoint should have less than 1% of error rate 99% of time (weekly)
SLI for this — endpoint Error rate (7 days)
A picture from atlassian