Job Overview
- Job Title Senior Site Reliability Engineer, Wikimedia Cloud Services
- Hiring Organization Wikimedia Foundation
- Company Website https://www.wikimedia.org/
- Remote Locations Worldwide
- Job Type Remote, Full-Time
The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to join our team, reporting to the Engineering Manager, Cloud Services. Cloud Services curates environments that host tools and services utilized across Wikimedia projects. A significant portion of edit traffic on Wikipedia for example, is done by community developed tools we host!
Our team maintains Infrastructure as a Service, Platform as a Service, and Data as a Service products. The team works in partnership (our puppet repo is public! And yes, you can contribute to it!) with the larger Wikimedia volunteer community to manage these environments. Candidates should be comfortable communicating in public and asynchronous ways with volunteers and developers from around the world.
You’ll work remotely with a full-time distributed team, with members spread between Europe and North America, and need to overlap (UTC-5 to UTC+1) working hours. Some examples of the type of work you’ll be doing include:
- Expanding the capabilities of our toolforge platform
- Expanding and refining our storage offerings, backed by Ceph and NFS
- Scaling our team via automation
- Providing a curated Jupyter notebook environment for data analysis and queries of Wikimedia data
- Upgrading, customizing, and adding new services like terraform support, and Database as a service to Openstack
- Developing new webservices for our technical community, like Quarry and PAWS
And the backlog has even more details!
Job Responsibilities
- Helping to create a repeatable Openstack cloud deployment
- Implementing a network topology using Open vSwitch, providing per tenant networking, load balancing, and IPv6
- Performing day-to-day operational tasks on Wikimedia’s Cloud Services infrastructure (deployment, maintenance, configuration, troubleshooting). Develop and support automation tools and processes in support of these tasks.
- Participating in on-call rotation and support in a 24×7 environment
Job Requirements
- Comfortable working and thriving within a Linux ecosystem
- Understand networking in the physical domain of switches and servers
- Software development skills in at least one of the following languages: Python, Go, Javascript, and/or Ruby
- B.S. or M.S. in Computer Science or related field or equivalent in related work experience.
Qualities that are important to us:
- Share our values, appreciate our code of conduct, support our team norms, and work in accordance with all three
- Strong English language skills and ability to work independently, as an effective part of a globally distributed team
- Support of our users (volunteer and staff developers) using our service offerings
- Passionate about the value of learning and growing together
Additionally, we’d love it if you have:
- Utilized configuration management tools such as Puppet, Ansible, Chef, and SaltStack
- Used Kubernetes, Docker Swarm, Mesos, or similar container orchestration platforms
- Operated an elastic computing environment such as OpenStack or Cloudstack
- Operated a multi-tenant capable software defined network (SDN)
- Experience in serverless computing environments
- Linux systems troubleshooting and debugging skills
- Interest in open source software projects and communities
How To Apply
Click “Apply” below to fill in the application form!
More Information
- Remote Job Location Anywhere
- Salary Offer to be discussed
- Experience Level Senior Level
- Education Level Bachelor's Degree, Master's Degree
- Working Hours to be arranged (full time based )
- Job Application Via Custom Application Page