Principal Data Analytics Engineer
About the role:
A Principal Data Analytics Engineer is responsible for architecting, designing and modelling Data Lake solutions/data ingestion pipelines by collaborating with our customers and understanding requirements on digital transformation. Able to navigate through ambiguous situations, to deal with aggressively changing environment. The drive to collaborate, gather feedback, solve problems, and tackle challenges through test and learn is highly valuable in this position. Should work closely with Software Engineers, Data Scientists, Azure & Network Administrator teams to build a scalable & compliant data system and production-ready ML pipelines. Work with different stakeholders as SME for Data Engineering, providing guidance on data readiness for AI initiatives and architectural support for Azure-native AI services.
Responsibilities:
Develop data engineering and analytics solutions within Healthcare Intelligence. Ensure the continuity of data processes and the associated batch jobs.
Partner with leadership, engineers, program managers, data analysts and scientists to understand data needs. Identify, design and implement/coordinate implementation of scalable processes and infrastructure to have good governance of automated data processes.
Manage the end-to-end data solutions of our customers: from raw data analysis to data flow and predictive framework configurations.
Manage performance, capacity, availability, security and compliance of data platform and data solutions. Should be able to work on a problem independently and prepare client ready deliverable with minimal supervision.
Elicit, analyze, and validate customer data from ingestion to production Monitor all data update processes and outputs to ensure predictive quality Communicate with customers to discuss any issues with received data and help them identify and fix data issues Solve day-to-day Data problems and customer challenges.
Own the automation, deployment and operation of data pipelines on MS Azure Build tools and mechanism to monitor and optimize different parts of the systems Build custom integrations between cloud-based systems using APIs Develop automated data quality profiling and cleansing routines to ensure "AI-ready" datasets that meet the strict accuracy requirements of clinical intelligence Proactively evaluate and integrate emerging Azure technologies and industry trends to modernize the existing data stack and reduce technical debt.
Monitor and manage Azure consumption and compute costs, implementing strategies to optimize resource utilization without compromising performance. Build and oversee comprehensive telemetry, alerting, and monitoring dashboards to proactively identify data drifts or pipeline latencies before they impact downstream customers Lead technical design reviews and provide mentorship to junior engineers, fostering a culture of continuous learning and operational excellence.
Skills and qualifications:
Working experience on Azure Data technologies like Data lake ADLSGen2, Azure Data Bricks, Azure Data factory, Azure SQL, Azure Synapse etc. Proven track record of advance analytics using python, sql or Gen AI abilities along with visual analytics abilities.
Required:
Experience in creating data orchestration using ADF and optimizing them through regular monitoring.
Exposure to Snowflake Cloud data Platform Familiarity with Linux/Unix scripting, Python, SQL Queries and Database concepts required.
Exposure to cloud onboard from legacy data sets. Experience working with large datasets and building data processing platformsFluent and fast with SQL, query analysis and optimization.
Deep skills in modeling data warehouses. Exposure and understanding of Agile process and DevOps
Must have experience in batch scheduling & rationalization through Control M Excellent interpersonal, verbal and written communication skillsExcellent analytical and critical thinking capabilities and problem-solving skills.
Exploratory Data Analysis (EDA): Conduct EDA to gain deep insights into data patterns, structures, and characteristics. Use visualizations to communicate effectively.
Statistical Analysis and Validation:
Engage in statistical analysis and hypothesis testing to validate assumptions.
Machine Learning Implementation (Must have): Develop and deploy advanced machine learning models to address specific business challenges effectively. Evaluate and fine-tune algorithms for improved accuracy and performance.
Presentation and Communication: Present findings and insights to stakeholders/ non-technical users through data visualizations and reports in a clear and understandable manner.
Documentation and Knowledge Sharing: Document the data analysis process and methodologies for future reference and knowledge sharing.
Cultural Enrichment: Foster a culture of innovation, knowledge-sharing, and continuous improvement within the data science team.