Site Reliability Engineer II / III

Department: Engineering
Location: US - Durham
Updated on: February 15, 2021

Back to Open Positions

MediaMath helps the world's top brands deliver personalized digital advertising across all connected touchpoints. Over 9,500 marketers in 42 countries use our demand-side platform every day to launch, analyze, and optimize their digital advertising campaigns across display, native, mobile, video, audio, digital out of home, and advanced TV formats.

MediaMath initiated an industry-wide effort to create a 100% accountable, addressable and aligned supply chain through SOURCE ecosystem. SOURCE by MediaMath is a technical and commercial framework for agencies, brands, tech companies, and content owners designed to provide long-term sustainable solutions for a clean digital media supply chain with brand-safe, viewable inventory. MediaMath has offices in 15 cities worldwide and is headquartered in New York City.

We receive up to 10 million advertising opportunities per second, which we run through over several internal and partner enrichments to extract many targetable properties which we then match against 100s of thousands of advertising strategies to choose the best ad and we do all this in less than 1/10th of a second.

Key Responsibilities
We are seeking a Site Reliability Engineer II / III well versed in large-scale distributed systems. Someone who will own the reliability and performance of those systems ensuring that our customers have the benefit of highly available and extremely effective products. You will do this by creating a bridge between development and operations, applying your software engineering mindset to various topics inclusive of system administration, observability, reliability, and performance. You will utilize your deep experience to simplify processes through automation while developing production software to continuously improve reliability and performance.

We work with many languages and technologies critical to the success of our platform including Golang, Scala, Clojure and, C++.  Chef, AWS, ScyllaDB, Kafka, Prometheus, Kubernetes and many more. We expect that you have experience with most of these and also a passion for becoming proficient with many more.

You will:

  • Use data from our observability stack and incident trends to prioritize reliability improvements
  • Provide architectural guidance on our critical customer facing services
  • Contribute to sprint development, executing on availability and performance topics within our product roadmap
  • Mentor and consult with product, development, and operations to drive reliability best practices
  • Work with Product Management and Engineering teams to answer priority concerns for reliability fixes
  • Define SLI/SLO/Error Budgets
  • Improve observability across all services
  • Participate in On-Call rotations shared with development teams
  • Automate deployment capabilities and implement auto healing philosophies
  • Collaborate with development teams on best practices, infrastructure setup, and planning activities with a focus on stability and performance

You have:

  • 3+ years of professional software development experience
  • Significant experience with standard Site Reliability Engineering practices
  • Firm understanding of SLO/SLI/Error Budgets
  • Demonstrated experience developing non-trivial applications in languages such as Golang, Scala, Clojure and C++ (or similar)
  • Broad experience building distributed and high throughput systems
  • Proven ability to understand commercial context when working with product managers in a SaaS environment
  • Previous experience in the AdTech industry (a plus)

You are:

  • Curious and capable of learning new codebases and systems quickly
  • Passionate about reliability, monitoring, automation, and continuous improvement
  • Willing to fail, fix, and retry
  • Someone who likes to solve problems with code
  • Someone with a desire to constantly learn and grow
  • Someone who seeks out cultures that embrace diversity and inclusion

Why We Work at MediaMath
We are restless innovators, smart, passionate and kind. At the heart of our culture are six values that provide a framework for how we approach our work and the world: Teams Win, Scale + Innovation, Obsess Over Learning & Growth, Align then Execute, Do Good Better and Embrace the Journey. These values inform how we energize one another and engage with our clients. They get us amped to come to work.

Founded in 2007 as a pioneer in "programmatic" advertising, MediaMath is recognized as a Leader in the Gartner 2020 Magic Quadrant for Ad Tech and has won Best Account Support by a Technology Company for two years in a row in the AdExchanger Awards.

MediaMath is committed to equal employment opportunity. It is a fundamental principle at MediaMath not to discriminate against employees or applicants for employment on any legally-recognized basis including, but not limited to: age, race, creed, color, religion, national origin, sexual orientation, sex, disability, predisposing genetic characteristics, genetic information, military or veteran status, marital status, gender identity/transgender status, pregnancy, childbirth or related  medical condition, and other protected characteristic as established by law.