Tag: SRE

Official Blog Security SRE Sept. 17, 2018

Trust through transparency: incident response in Google Cloud - White paper which explains how Google Cloud manages incidents.

Official Blog SRE Aug. 6, 2018

Repairing network hardware at scale with SRE principles - Google’s SRE principles to guide developers and operations teams toward better systems reliability.

Official Blog SRE July 23, 2018

SRE fundamentals: SLIs, SLAs and SLOs - Learn about SRE fundamentals: SLIs, SLAs and SLOs.

SRE July 2, 2018

Understanding error budget overspend - part one - CRE life lessons - Questions to consider to see if you need to recalibrate your error budget - when dowtime of your applications is more than your service level objectives.

SRE July 2, 2018

Good housekeeping for error budgets - part two - CRE life lessons - Fixing the root that causes overspending error budget.

SRE July 2, 2018

Kubernetes podcast - #9 SRE, with Tina Zhang and Fred van den Driessche.

Official Blog SRE June 4, 2018

Troubleshooting tips: Help your cloud provider help you - Tips for communicating with cloud provider support team.

Official Blog SRE June 4, 2018

Troubleshooting tips: How to talk so your cloud provider will listen (and understand) - Practical tips on communicating with cloud providers since cloud presents a new way of working for IT teams shifting away from legacy systems.

Official Blog SRE May 14, 2018

Defining SLOs for services with dependencies - CRE life lessons - How to define and manage SLOs for services with dependencies.

DevOps Official Blog SRE May 14, 2018

SRE vs. DevOps: competing standards or close friends? - What exactly is SRE and how does it relate to DevOps?

SRE March 19, 2018

Risk and Error Budgets - How the SRE discipline reduces tension over velocity/stability between product teams and system operators by quantifying risk and employing error budgets.

Official Blog SRE Feb. 12, 2018

Applying the Escalation Policy — CRE life lessons - CRE Life Lessons: Explore some scenarios to apply the Escalation Policy

SRE Jan. 22, 2018

An example escalation policy — CRE life lessons - This post demonstrate lightly-edited SLO escalation policy and associated rationales from a Google SRE team to illustrate the trade-offs that particular teams make to maintain a high development velocity.

SRE Jan. 8, 2018

Consequences of SLO violations — CRE life lessons - Article explains importance of creating a policy to handle Service Level Objective (SLO) violations, role of Site Reliability Engineers (SREs) and Devs in responding to SLO violations and structure of policy.

SRE Dec. 11, 2017

Getting the most out of shared postmortems — CRE life lessons - In this post, it's considered how to review a postmortem with your affected customer(s) for better actionable data and also to help customers improve their systems and practices.

SRE Oct. 30, 2017

Building good SLOs - CRE life lessons - Practicle tips how to formulate Service Level Objectives for Service Level Indicators

SRE Aug. 14, 2017

CRE life lessons: The practicalities of dark launching - How to deal with some circumstances that can some up with dark launching.

SRE Aug. 7, 2017

CRE life lessons: What is a dark launch, and what does it do for me? - Dark launch sends a copy of real user-generated traffic to your new service, and discards the result from the new service before it's returned to the user.

SRE July 10, 2017

Making the most of an SRE service takeover - CRE life lessons - In Part 2 of this blog post we explained what an SRE team would want to learn about a service angling for SRE support, and what kind of improvements they want to see in the service before considering it for take-over. And in Part 1, we looked at why an SRE team would or wouldn’t choose to onboard a new application. Now, let’s look at what happens once the SREs agree to take on the pager.

SRE June 26, 2017

Why should your app get SRE support? - CRE life lessons - Practical tips how to organize Site Reliability Engineering team

SRE May 29, 2017

Know thy enemy: how to prioritize and communicate risks - CRE life lessons - This time how to identify and mitigate risks in your system

SRE April 3, 2017

How release canaries can save your bacon - CRE life lessons - Description of release process using canary (gradual) release from Site Reliability Engineering team

SRE March 27, 2017

Reliable releases and rollbacks - CRE life lessons - Life lessons from SRE (Site Reliability Engineer) when new release is deployed but something goes wrong

SRE March 6, 2017

Incident management at Google — adventures in SRE-land - How engineers in Google handle incidents in their data centres


Latest Issues


Zdenko Hrček
Třebanická 183
Prague, Czech Republic
Phone: +420 777 283 075
Email: zdenko@gcpweekly.com