Director, Site Reliability Engineer

Department: Engineering
Location: US, New York, New York, United States
Updated on: July 22, 2021

Back to Open Positions

MediaMath helps the world's top brands deliver personalized digital advertising across all connected touchpoints. Over 9,500 marketers in 42 countries use our demand-side platform every day to launch, analyze, and optimize their digital advertising campaigns across display, native, mobile, video, audio, digital out of home, and advanced TV formats.

MediaMath initiated an industry-wide effort to create a 100% accountable, addressable and aligned supply chain through SOURCE ecosystem. SOURCE by MediaMath is a technical and commercial framework for agencies, brands, tech companies, and content owners designed to provide long-term sustainable solutions for a clean digital media supply chain with brand-safe, viewable inventory. MediaMath has offices in 15 cities worldwide and is headquartered in New York City.

Key Responsibilities

We are seeking a Director for MediaMath’s Site Reliability Engineering team.  The purpose of this role is to define, oversee and execute against MediaMath’s Site Reliability Engineering charter and supporting team goals. The primary focus areas will include people management, as well as providing development and product management teams with creative solutions to increasing the reliability of our core and client-facing services.

Your work will require supporting your team’s intellectual curiosity and critical attention to detail.  Our SRE environment is always open to change and embracing new technologies, maintaining a strong alliance with other engineers. Ideally, we are looking for a candidate who has spent a significant amount of time as a Site Reliability Engineer as well as past prior success in an engineering people leadership role.

You will:

  • Day to day management of MediaMath’s Site Reliability Engineering practice, overseeing a team of roughly 12 engineers
  • Participate in new service definitions and SRE resource assignments as needed
  • Provide feature/service design consultancy with Product Management and development departments
  • Inspect proper critical service SLO monitoring, alerting and ensure incident response processes are being adhered to as well as supported
  • Setting up and leading one on one meetings with Site Reliability Engineers
  • Align and monitor individual and team goals for MediaMath’s Site Reliability Engineering team
  • Develop and oversee career path development of MediaMath’s Site Reliability Engineers
  • Help with incident and problem post-mortems as needed
  • Assist in the analysis and on-going capacity planning and budget forecasting for critical service lines 

You have:

  • 7+ years of experience as a Site Reliability Engineer prior to stepping into a Site Reliability Engineer management position
  • 3+ years of experience managing a Site Reliability Engineering team within a high-volume, distributed systems environment
  • Optimization of infrastructure systems and processes
  • Perform on-going resource planning, capacity planning and budgetary forecasting as needed
  • Superior level understanding and past experience with implementing approaches for SLI and SLO monitoring, alerting for service improvement and sustainability initiatives
  • Experience with analyzing and inspecting service demands and on-call rotation development
  • Experience collaborating with support organizations and product management to interpret service trends and influence product roadmaps for sustainability improvements
  • Deep understanding of platform and infrastructure interdependencies to provide operational/scaling guidance on system internals (filesystems, syscalls, cgroups, etc.)
  • Networking familiarity (routing, SDNs, network topologies)
  • Familiarity with one or more of the following is desirable: Perl, Ruby, Go, C, C++, Scala, Java
  • Familiarity with one or more of the following applications is desirable: Foreman, Apache, HAProxy, Prometheus, Kubernetes, Graphite, Kafka, Redis, Cassandra
  • Familiarity with public cloud infrastructure, such as AWS

Why We Work at MediaMath

We are restless innovators, smart, passionate and kind. At the heart of our culture are six values that provide a framework for how we approach our work and the world: Teams Win, Scale + Innovation, Obsess Over Learning & Growth, Align then Execute, Do Good Better and Embrace the Journey. These values inform how we energize one another and engage with our clients. They get us amped to come to work.

Founded in 2007 as a pioneer in "programmatic" advertising, MediaMath is recognized as a Leader in the Gartner 2020 Magic Quadrant for Ad Tech and has won Best Account Support by a Technology Company for two years in a row in the AdExchanger Awards.

MediaMath is committed to equal employment opportunity. It is a fundamental principle at MediaMath not to discriminate against employees or applicants for employment on any legally-recognized basis including, but not limited to: age, race, creed, color, religion, national origin, sexual orientation, sex, disability, predisposing genetic characteristics, genetic information, military or veteran status, marital status, gender identity/transgender status, pregnancy, childbirth or related  medical condition, and other protected characteristic as established by law.