Principal Data Engineer
Overview of Providence:
At Providence, we use our voice to advocate for vulnerable populations and needed reforms in health care. We pursue innovative ways to transform health care by keeping people healthy, and making our services more convenient, accessible and affordable for all. In an increasingly uncertain world, we are committed to high-quality, compassionate health care for everyone—regardless of coverage or ability to pay. We help people and communities’ benefit from the best health care model for the future—today.
Together, our 119,000-plus caregivers/employees serve in 51 hospitals, more than 1000 clinics and a comprehensive range of health and social services across Alaska, California, Montana, New Mexico, Oregon, Texas and Washington in United States.
Providence Global Center recently launched in Hyderabad, India as Global Capability Center for Providence looking to leverage the India talent to help meet our global vision and scale our Information Services and products to the world of Cloud.
What will you be responsible for?
- Define and implement scalable, efficient, and secure data architectures to meet organizational needs.
- Design & develop the architecture for data pipelines, data lakes, and data warehouses.
- Ensure alignment with business objectives and technology roadmaps. Design and build robust ETL/ELT pipelines to ingest, transform, and process large datasets from diverse sources.
- Designing and modelling Data lake solutions/data ingestion pipelines by collaborating with our customers and understanding requirements on digital transformation
- Implement real-time data streaming and batch processing solutions using modern tools and frameworks.
- Optimize pipelines for performance, scalability, and cost efficiency.
- Partner with data scientists to support AI/ML model development and deployment.
- Facilitate feature engineering, data preparation, and data versioning for AI/ML workflows.
- Implement MLOps practices, including model deployment, monitoring, and lifecycle management.
- Enforce data governance policies, including data quality, lineage, and security standards.
- Ensure compliance with data privacy regulations and organizational policies.
- Establish best practices for metadata management and data cataloging.
- Provide technical leadership and mentorship to data engineers and cross-functional teams.
- Advocate for engineering excellence and continuous improvement in data engineering practices.
- Drive innovation by exploring emerging technologies and methodologies.
- Analyze and optimize data systems for performance, reliability, and cost-effectiveness.
- Establish monitoring and alerting systems to ensure data pipeline health and integrity.
- Troubleshoot complex data engineering challenges and resolve bottlenecks.
Who we are looking for ?
- 8 – 12 years of experience in designing enterprise-scale data solutions, including cloud platforms with a strong focus on enterprise data strategy, architecture governance, and advanced analytics solutions
- Strong proficiency in building and maintaining large-scale data pipelines using tools like Apache Spark, Kafka, Airflow, or similar technologies.
- Expertise in SQL and programming languages such as Python, Scala, or Java.
- Hands-on experience with distributed data storage systems (e.g., Hadoop, Snowflake, Redshift, BigQuery). Deep understanding of cloud-based data platforms and services (AWS, Azure, GCP).
- Experience with cloud-native tools such as Azure Data Factory, or Google Cloud Dataflow.
- Proficient in designing data architectures that align with governance and compliance standards.
- Experience with data modeling, schema design, and data lineage.
- Proficiency in CI/CD pipelines and infrastructure-as-code tools like Terraform or CloudFormation.
- Knowledge of containerization and orchestration tools (Docker, Kubernetes).
- Strong ability to diagnose complex data engineering issues and implement effective solutions.
- Analytical mindset for optimizing performance and scalability.
- Familiarity with AI/ML workflows, including feature engineering, model training, and deployment. Knowledge in MLOps frameworks (MLflow, Kubeflow).
- Hands-on experience with deep learning models, NLP, or computer vision.
- Familiarity with advanced MLOps practices, such as automated retraining and continuous monitoring.
- In-depth knowledge of big data frameworks like Apache Hadoop, Hive
- Experience with graph databases or real-time analytics tools.
- Understanding of business processes and the ability to translate business needs into technical solutions.
- Experience working in domain-specific contexts such as healthcare, finance, or retail.
- Exploring and integrating cutting-edge technologies like generative AI, real-time analytics, or active metadata management.