Careers > Engineering
Site Reliability Engineer
Pulumi is reinventing how people build modern cloud applications and services, through a unique platform that combines deep systems and infrastructure innovation with elegant programming models and developer tools.
Our team is a diverse and talented group of individuals, with backgrounds in distributed cloud systems, programming languages, developer tools, and operating systems, from companies from all corners of the software industry. Our culture is one of technical excellence, passion for teamwork, and customer obsession.
Site Reliability Engineering (SRE) is an engineering discipline that applies engineering process and rigor to the role of building and running large-scale, distributed, and fault-tolerant cloud systems. SRE is a mindset and a set of engineering approaches to running better production systems, leveraging creativity and invention paired with discipline and focus. Google pioneered the SRE role and we here at Pulumi fully subscribe to the philosophical improvements the SRE approach brings to classical DevOps and Ops.
As an SRE, you will help Pulumi build, manage, and optimize our most complex infrastructure and systems. You will be responsible for the big picture of how our systems relate to each other and you will work closely with the entire engineering organization to coach and mentor the team in all aspects of building reliable distributed systems. These systems span a wide range from low-level networking infrastructure, to storage and databases, to containers and orchestrators, all the way up to serverless and event-driven programs. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives, and embraces the intellectual curiosity and learning culture that is inherently part of the SRE role.
Because Pulumi's entire mission is to help our customers build robust cloud systems themselves, you will also have a unique role in helping Pulumi to drive feedback and continuous improvement back into the core products, platforms, and tools that we are offering to customers. In fact, many of the tools and techniques you develop as an SRE will become part of our offerings.
- Engage in and improve the whole lifecycle of services -- from inception and design, through deployment, operation, and refinement
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
- Practice sustainable incident response and blameless postmortems
- Experience with algorithms, data structures, and software architecture
- Experience in one or more of the following systems: Docker, Kubernetes, Chef, Puppet
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- Ability to debug and optimize code and automate routine tasks
- BS degree in Computer Science or equivalent practical experience
Interested? Email us.
At Pulumi, we don't just accept difference, we celebrate it, support it, and thrive on it for the benefit of our employees, our products, and our customers. Pulumi is proud to be an equal opportunity workplace and is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual origin, age, citizenship, marital status, disability, gender identity, or Veteran status. If you have a disability or special need that requires accommodation, please let us know.