Mozila

Job Overview:

Mozilla is a category of one: a global technology not-for-profit super-powered by a worldwide community of volunteers, with a mission to keep the internet a healthy public resource for all. By building great products, creating innovative technologies, and engaging people to take action, we create an outsized impact in the world. We always place people ahead of profit.

Mozilla wants you to help fight for an Internet that’s open and accessible to everyone. We fulfill that mission as both a corporation and a non-profit organization, blending technology with advocacy, policy and education.

Site Reliability Engineering treats operations as a software problem. In SRE, we flip between the fine-grained detail of application debugging to the big picture of capacity across a range of systems with a user population measured in hundreds of millions. We are responsible for our products in production. We drive reliability and performance by mastering the full depth of the stack. We see no reason for system downtime. You will have the opportunity to take on complex problems of scale while using your expertise in coding, algorithms, complexity analysis and large-scale system design. And your career will take big steps forward working with some of the best developers in the industry.

Your independence, curiosity, and willingness to try new things will be an asset, not a liability.

Sound exciting? Send us a brief cover letter and resume highlighting how you fit the following:

Job Responsibilities:

Design, develop, document and deliver software to improve the availability, scalability, latency and efficiency of Mozilla’s services and infrastructure.
Solve problems relating to critical services and build automation to prevent problem recurrence with the goal of automating response to all non-exceptional service conditions.
Engage in service capacity analysis, demand forecasting, software performance analysis and system tuning.
Provide occasional after hours and weekend support as part of an on-call rotation for critical Mozilla services.

Job Requirements:

BS degree in Computer Science or related technical field or 5 years prior relevant experience.
Experience with Python, preferably in a web and/or infrastructure automation setting
Experience in designing, analyzing and running large-scale distributed systems
Experience hosting and solving problems with public-facing services securely in AWS or GCP
Experience designing and delivering deployment automation
Experience automating infrastructure with tools such as Terraform, Ansible, Chef, Puppet
1+ years working remotely with distributed teams
Experience with Kubernetes or an eagerness to learn Kubernetes as a platform
Familiarity with Linux container engines like Docker
Systematic problem solving approach, coupled with a strong sense of ownership and drive. You’re willing to dive into a problem from any level including application code, database, networking, and content caching to identify performance or availability issues.