Sr. Cloud Engineer
How is this team contributing to the vision of Providence?
Healthcare Intelligence is the pillar that focuses on creating intelligent products within Providence and provides unique opportunities in Product Development, Product Design, Data Engineering, Operations, Data Science, BI Reporting and Data Analytics on cloud stack. We are a group of professionals who work towards enabling decisions that improve patient and caregiver experience.
Now, as we face a new frontier—a changing health care landscape - we are looking for pioneering and compassionate individuals to plan for the next century and who can work on re-imagining the future of care with cutting-edge technologies such as big data, machine learning, artificial intelligence, IoT, and blockchain that enhance patient outcomes and experiences and more importantly, drive a lasting social impact.
What will you be responsible for?
As a Cloud Platform Engineer, you play responsibility of Cloud platform i.e., Azure Administration to sustain availability & operational efficiency of environment. We have critical applications hosted on Azure Infrastructure using multiple Azure services i.e., AzureSQL, ADF, AKS, Azure VM etc. hence platform availability & reliability becomes at most priority. You will be responsible for availability, reliability, and performance of Azure Cloud Infrastructure.
As a Sr. Cloud Engineer, you will
- Deploy and manage Azure IaaS services (Virtual Machines, Networking, Storage, Key Vault, AKS).
- Provision and configure Azure PaaS services (App Services, Azure SQL, API Management, Data Factory).
- Implement Infrastructure as Code (IaC) using Terraform and ARM templates for automated deployments.
- Design and maintain secure cloud architectures, applying best practices for identity, RBAC, and compliance.
- Configure and manage networking components including VNets, NSGs, Firewalls, and Private Endpoints.
- Maintain and optimize CI/CD pipelines using Azure DevOps for continuous delivery and automation.
- Collaborate extensively with Product teams on enabling team on Azure platform & resolving any issues with AKS, Azure VM, Azure Storage Accounts, ADF, AzureSQL and Azure Infrastructure services.
- Work with Azure AI and Cognitive Services such as Azure OpenAI, AI Foundry, Speech, Vision, and Language APIs.
- Ensure end-to-end observability using Azure Monitor, Log Analytics, and Datadog for proactive monitoring and alerting.
- Create dashboards and telemetry for performance tracking and incident resolution.
- Track resource utilization and cost optimization through governance policies, tagging, and scaling strategies.
- Collaborate with cross-functional teams for incident management and root cause analysis.
- Manage on-call rotations across geo-locations, using a follow-the-sun model.
- Sound troubleshooting issues skills & participating in Severity issues & CODE RED calls.
What would your day look like?
- Monitor & address all incidents & user requests associated to Azure Infrastructure and respective services.
- Discussion with Product team on their application architecture, provide solutions & address operational & performance issues.
- Collaborate with Enterprise Infrastructure team, Network team and Infosec/CYBR team on implementing Enterprise level policies, remediate any security violations.
- Work with MSFT support on severity issues & escalation to achieve resolution.
Who are we looking for?
- Bachelor s/equivalent in Engineering
- 3 to 6 years of experience as a Cloud Infrastructure administration with min. 4+ years with Azure administration.
- Strong experience with Azure Core Services: Compute, Storage, Networking, Security.
- Hands-on expertise in Terraform and ARM templates for IaC.
- Proficiency in PowerShell and Python scripting for automation.
- Knowledge of Azure Monitor, Log Analytics, and Datadog for observability.
- Familiarity with Azure DevOps, Git/GitHub, and CI/CD practices.
- Experience with containerization using Kubernetes (AKS) and Docker.
- Strong grasp of cloud security principles, RBAC, Key Vault, and encryption strategies.
- Ability to optimize resource utilization and cost management in Azure environments.
- Working experience with System reliability: design, implementation and maintain system to ensure high availability and reliability of service and platform.
- Experience with source code control systems such as Git/GitHub & ADO.
- Experience with agile methodologies and tools such as Azure Devops and Jira.
- Develop & manage monitoring, dashboard, and alerts to proactively identify and address issues using Datadog or Azure Telemetry tools.
- Experience with Azure AI Services, APIM, OpenAI etc. will be preferrable.
- Participating in incident management, root cause analysis and optimization of application health.
- Proven track record of working both independently and collaboratively as part of a multi-disciplined team.
- Experience in implementing and integrating with IaaS, PaaS and SaaS data platforms and other Cloud infrastructure services such as Azure AD
- Strong critical thinking skills, and the ability to think on your feet.
- Ability to adapt quickly and maintain a positive attitude.
- Excellent verbal and written communication skills .
- Ability to take ownership of issues, work independently or escalate as needed, and find creative ways to resolve problems.
- Good collaborative skills to work with local and global teams, strong team player and create a one team, one company culture.