I lead the data infrastructure, AIOps, and cloud engineering function at Aga Khan University, with 12 years building and running production data platforms across AWS, Azure, and GCP for distributed teams in Kenya and Pakistan. My MedGemma-based clinical decision support work, grounded in Kenyan Ministry of Health guidelines, was recognized through the Google GenAI Accelerator Award.
My platform work increasingly sits at the intersection of data engineering and AI. I have built feature engineering pipelines that reduced model-ready dataset preparation time by 30% for data science teams, implemented statistical anomaly detection for proactive data quality monitoring, and architected data infrastructure supporting LLM and intelligent automation workflows. I build the data foundation that makes AI and analytics trustworthy, scalable, and production-ready.
A practical breakdown of database fundamentals beyond the textbook definitions — consistency models, isolation levels, and how indexes actually behave under load.
In progress
What I Learned Building RAG Systems for Healthcare in East AfricaDraft
Lessons from designing retrieval-augmented generation for clinical decision support in resource-constrained environments.
Cutting Streaming Costs Without Cutting CornersDraft
Practical patterns for reducing Kafka and Pub/Sub costs without sacrificing reliability.
Lakehouse Patterns for Multi-Country Research DataDraft
Lakehouse architecture for research collaboration across Kenya, South Africa, and the US.
Building a Real-Time Fitbit Streaming PipelineDraft
Streaming physiological data for 600+ research participants across multiple countries.
Drafts are unpublished and intentionally unlinked. Published pieces appear under Published above as they go live.
Projects
Open source work across data infrastructure, healthcare AI, ML systems, and cloud-native platforms.
Conformal prediction for clinical AI under covariate shift.
Gaussian Processes
Healthcare AI in low-resource African contexts.
Compute Governance
Access barriers for African AI innovators.
Work Experience
2025–present
Aga Khan University — Nairobi, Kenya
Manager, Data Infrastructure, AIOps, MLOps & Cloud Engineering
~ Own data and AI platform delivery for AKU Global Data & Innovation, reporting to the Chief Data Officer. Lead distributed engineers across Kenya and Pakistan. Set SLAs for data uptime, quality, and freshness across the AKU Hospital data platform, the NIH-funded Uzima-DS Consortium, and internal research teams.
~ Took Afya Gemma from prototype to production. The MedGemma-based clinical decision support system now operates daily for resident and intern doctors at AKU Hospital (Google GenAI Accelerator Award). Designed the two-stage retrieval architecture (Gemini 2.5 Flash classifier before MedGemma generation) on a ChromaDB vector store.
~ Architected and scaled real-time and batch data platforms supporting research, analytics, and clinical operations across Kenya, Pakistan, and external consortium partners. Designed the secure Conversational RAG platform on Azure OpenAI, PgVector, and Azure AI Search with full data sovereignty.
~ Started the early architecture and prototyping of Afya Gemma.
~ Built foundational data pipelines and platform components. Built the university's first enterprise clinical data repository (records 2008–present). Designed multi-country ingestion pipelines including real-time Fitbit streaming for 615 healthcare workers in Kenya.
Kafka · Flink · Spark · Python · Azure Data Factory · Delta Lake · Airflow
2021–2022
Copia Global — Nairobi, Kenya
Data Platform Manager (AWS)
~ Architected AWS-based data platforms, including Lambda pipelines processing 250K+ rows/day and MLOps systems that reduced referral program costs by 60%. Built the data engineering function from the ground up.
~ Owned the design and evolution of AWS-based data platforms supporting analytics, operations, and ML use cases. Delivered production ML systems including recommendation engines.
~ Redesigned streaming pipelines and data contracts to reduce cloud costs by ~25%. Built automated ML infrastructure on GCP and delivered architecture, implementation, and DataOps practices.