The website you are viewing is deployed by, and running on, a stack entirely built by Ethan Lieske. The stack is Kubernetes running on Vmware VMs that in turn run on a network composed of Dell switches and Forigate firewalls. Services are deployed via ArgoCD while images are built from a local Gitlab installation and Gitlab CI. This stack is meant as an example of how proper CICD should and can be done with existing open source tools.
The great challenge facing tech organizations today is their clients' insatiable need for features and reliability. In many situations reliability is sacrificed on the altar of feature requests. While this provides short-term gain, it leads to long-term issues. Outside of the obvious impact of unstable software, feature-only development leads to talent retention issues, and, ironically, feature delivery delays. This is where an empowered SRE team can be positioned to balance the needs of the product with the needs of maintainable software and systems. A core tenet of the philosophy is that of platforms: deployments, telemetry, monitoring, logging, etc., should all be provided as a platform. Developers should not be wasting cycles re-inventing the wheel when it comes to these categories. Error budgets are another core tenet, this being less about implementing a by-the-book error budget and more about how you prioritize work. Development work should be prioritized in part by the current stability and maintainability of the service. A service that is constantly under-performing should not be receiving features until it is stabilized. Codifying error budgets into the company processes also reduces the constant struggle between tech and product organizations over the priority of work. It is important that SRE have a place at the leadership table, otherwise they quickly devolve into a legacy operations team that will struggle to retain talent and provide value to the business.
The maximum amount of time that a technical system can fail without
contractual consequences.
Typically measured in hours per quarter.
An agreement or statement on the minimal availability that a service will have over a given time frame.
Typically measured in percentage of availability over a given time frame.
The actual computed metrics that will be compared to the SLOs.
Unified platforms, deployments, frameworks, and telemetry allow for faster and more efficient software development
When every application is deployed, built, and monitored the same way, you can move development resources between projects with minimal disruption.
SRE principles allow developers to spend more time on innovation. This lowers the cost of developing patterns and services.
Images from Freepik
15+ years of building systems and leading teams to operational excellence.
All referenced systems below include enterprise deployment and
management
1749 S Columbian Way, Seattle, WA 98108
(586)-569-9086
created with
Free Website Builder .