This job can be performed Remotely, anywhere in the US
MediaMath’s Platform Operations team handles all the challenges of a real-time advertising stack. We run a hybrid environment, with multiple globally placed on-prem datacenters and the AWS cloud, supporting a broad range of services - from low latency bidding processes handling millions of transactions per second, through to big data storage and analytics, and client-facing UI and reporting solutions. Each has its own unique operational challenges, and our team is a key partner in ensuring these workloads are managed in stable, scalable ways.
As part of the Platform Operations team, the Systems Engineer (data management) will be responsible for a range of globally distributed production pipelines and datastores, handling our most business critical datasets. This includes a network of Kafka clusters, and multiple large noSQL database clusters, as well as the support of other data solutions in partnership with existing SMEs. This engineer will be hands-on at all levels, from day-to-day care and feeding, to fortifying and modernizing existing offerings, building solutions for management and monitoring, and helping guide overall organizational strategy.
ESSENTIAL DUTIES AND RESPONSIBILITIES
- Build, maintenance and administration of production kafka clusters and related services (AWS and on-prem)
- Monitor and regularly assess t of capacity needs for global kafka footprint, with consideration to upcoming roadmap items.
- Monitoring health of kafka cluster & associated services, and implement appropriate alerting.
- Migration of existing tooling for kafka management into company standard toolsets (chef, ansible, circleci)
- Support application teams with kafka development efforts.
- Back-up support of production Hadoop clusters, as requested by SME.
- Build, maintenance & administration of business critical database clusters (Scylla, Cassandra, Aerospike)
- Work with development teams to ensure cluster architectures supports business need, including replication strategy, hardware/site redundancy, failover testing, etc
- Monitor health and regularly assess capacity needs for production databases
- Monitor database cluster health and implement appropriate alerting
- Support hardware strategy for database services, including specing, testing, maintenance and tuning.
- Work with development teams to ensure cluster architectures supports business need, including replication strategy, hardware/site redundancy, failover testing, etc.
- Consult on the design of new database offerings, and provide recommendations on strategy.
- Back-up support of production PostgrSQL clusters, as requested by SME
- Conduct training sessions to share knowledge with peers and development groups
- Act as in-team SME for data pipelines & storage – providing guidance and oversight to others in-team, and across the development community.
- Communicate current status of all projects, problems, and issues to the department management team
- Support audit and compliance efforts, and initiate corrective action when appropriate for remediation
- Participate in on-call rota as part of Platform Operations team.
- Proficiency with Linux system administration (Debian, Ubuntu, CentOS)
- Proficiency with basic AWS administration (IAM, EC2, Networking, cost analysis)
- Proficiency with scripting & basic coding (python, ruby, golang)
- Understanding of networking fundamentals, including application layer protocols (HTTP, SSH, SSL), load balancing solutions (lvs, nginx), and DNS
Role specific experience:
- Extensive production experience with at least one distributed data processing software (Kafka strongly preferred, Hadoop valuable)
- Extensive production experience with at least one noSQL database software (Scylla preferred)
- Experience with relational database software (PostgreSQL preferred)
- Experience supporting low-latency, globally distributed services at scale.
- Experience working with private datacenter infrastructure (“on-prem” servers)
- Experience leveraging config management toolsets (Chef, Salt, Ansible)
- Experience leveraging common deployment toolsets (CircleCi, Jenkins, Artifactory)
- Experience collecting and analyzing metrics for service level monitoring using prometheus and grafana
- Working knowledge of Kubernetes-based deployments
- Practical approach to real world problems, with “hands-on” approach to solutions.
- Ability to think strategically, understand business context, and make collaborative decisions
- Willing to gather information & recommend courses of action clearly and confidently.
- Fosters open communication, speaks with impact, listens to others, and writes effectively
- Comfortable partnering with internal clients in development teams to address issues, consult on solutions and plan for future needs.
- Ability to communicate sometimes complex ideas to to non-technical stakeholders, including product and support teams.
- Desire to mentor and provide guidance to junior engineers, both technically and professionally.
- Willingness to adhere to, streamline and help improve team processes for work tracking, knowledge sharing, incident response, and cross-org communication.
Why We Work at MediaMath
We are restless innovators, smart, passionate and kind. At the heart of our culture are three values that provide a framework for how we approach our work and the world: Win Together, Obsess Over Growth, and Do Good, Better. These values inform how we energize one another and engage with our clients. They get us amped to come to work.
Founded in 2007 as a pioneer in "programmatic" advertising, MediaMath is recognized as a Leader in the Gartner 2020 Magic Quadrant for Ad Tech and has won Best Account Support by a Technology Company for two years in a row in the AdExchanger Awards.
MediaMath is committed to equal employment opportunity. It is a fundamental principle at MediaMath not to discriminate against employees or applicants for employment on any legally-recognized basis including, but not limited to: age, race, creed, color, religion, national origin, sexual orientation, sex, disability, predisposing genetic characteristics, genetic information, military or veteran status, marital status, gender identity/transgender status, pregnancy, childbirth or related medical condition, and other protected characteristic as established by law.