Position Overview:
The Site Reliability Engineering (SRE) team is the first line of defense for a smoothly running site. They will monitor the site metrics on a 24x7x365 basis and ensure any issues are tracked to resolution. The DevSecOps team (peer) will be responsible for building out the dashboards that this team uses, so you will act as the product manager for those dashboards and may assist in the implementation. The Development teams will have increased responsibility to respond to pager duty alerts, so this team will ensure that the right teams are engaged and actively working issues.The team will continue to perform manual corrective actions to the databases as needed but will at the same time work with Product ownership to close product gaps that are leading to excessive manual work.
Responsibilities:
- Establish a three-shift follow-the-sun model for site operations, with two shifts resident in our offices in Hyderabad and a third in our headquarters in Plantations, FL.
- Establish standard policies and procedures for site operations including incident management and problem tracking.
- Ensure cross-training across all NationsBenefits site operations teams.
- Work with engineering teams for on-going increased visibility and reliability.
- Identify and work across stakeholders to address operational issues.
- Report to senior leadership on site incidents and recovery operations.
- Work with DevSecOps and Dev Teams to implement automatic management capabilities
such as auto-scaling - Identify product and implementation gaps (including gaps in instrumentation) that impede
the ability of the site to be operated at scale, including those that require regular manual
work - During incidents, work with Operational Governance and Engineering teams to ensure that correct engineering resources are engaged and tracking to problem resolution.
- Post incident, work with teams to ensure RCAs are complete and correct.
- Establish baseline metrics for all services and monitor over time.
- Establish Green/Yellow/Red performance levels and escalate problems as metrics degrade
release to release
What We Offer:
- Competitive salary and benefits package.
- Opportunity to work on a groundbreaking FinTech application with a high degree of impact.
- A collaborative and inclusive work environment that fosters innovation and growth.
- Career development opportunities, including leadership training and mentorship