Site Reliability Engineer
As a Site Reliability Engineer, you will be working to improve the reliability and performance of premiere digital properties. This person will be deeply hands-on with infrastructure, systems, automation, monitoring and system telemetry, and operational processes. This person will understand the challenges around rapidly creating, scaling and managing distributed applications and will be able to collaborate with talented engineers across multiple disciplines to address those challenges.
· Troubleshoot issues across the entire stack: hardware, software, application and network. Physical hardware and cloud-based environments.
· Drive standardization efforts across multiple disciplines and services
· Identify and drive opportunities to improve automation for the company
· Manage timely resolution of all critical and/or complex problems meeting SLA requirements
· Participate in a 24×7 on-call rotation
· Ability to effectively communicate with all levels of management and all stakeholders.
· Develop, configure and optimize service and application monitoring and telemetry
· Proficient with TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
· Ability to actively participate in infrastructure design and implementation.
· Solid knowledge of shell scripting and at least one scripting language (Python strongly preferred)
· Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
· Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
· Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
· Must work well with and be able to influence a myriad of personalities at all levels