Large-scale data systems for
high-stakes organizations
Senior Data Engineer and Technical Consultant specializing in PySpark, Python, and cloud infrastructure for Government, Healthcare, and Financial Services. Based in Ottawa, Canada.
I'm Neel Shah — a Senior Data Engineer and Technical Consultant with 10+ years building large-scale data systems for organizations that handle sensitive, high-volume data. My core expertise is PySpark, Python, and cloud infrastructure — delivering reliable, compliant, and scalable pipelines across Government, Healthcare, and Financial Services.
As Tech Lead at CIHI (Canadian Institute for Health Information), I lead large-scale PySpark pipelines processing over 1 billion Canadian health data points — covering national registry, diagnosis, and pharma datasets — for federal government, provincial governments, and NPO clients. I manage client relationships, lead the engineering team from architecture to delivery, and ensure strict compliance with PIPEDA and provincial health privacy legislation. This means PII governance, audit trails, and data security are part of every technical decision I make.
Before CIHI, at EXL Service on-site at Goldman Sachs, I built PySpark-based credit risk management platforms handling 1 million financial transactions per hour — powering Apple Card, Walmart Card, and GM Card risk decisioning with full regulatory compliance. I also architected cloud-based systems at Canopy Growth, Manulife, and SITA — building high-availability infrastructure across Azure and AWS for millions of users.
I use modern AI tools — Claude, GPT, and local LLMs — as productivity accelerators: faster code reviews, automated documentation, data validation pipelines, and intelligent development workflows. AI makes my engineering output faster and higher quality — it's a tool, not the product.
I also created emot — an early open source contribution that grew to 3M+ downloads. It's a reminder that the best tools solve one thing really well.
Originally from Vadodara, India — where I graduated top of my engineering class — I moved to Canada for graduate studies and have been contributing to the Ottawa tech community ever since.
- 📍 Ottawa, Ontario, Canada
- 🏢 CIHI (current)
- 🎓 Lakehead University
- 💻 10+ years experience
- 📄 3 research papers · 89+ citations
- 📦 3M+ open source downloads
- 🌍 5 languages
- English Native
- Hindi Native
- Gujarati Native
- French Elementary
- Sanskrit Limited
Technical Skills
PySpark & Big Data
PII & Compliance
Cloud & Infrastructure
AI-Accelerated Development
Additional Skills
Experience
Lead large-scale PySpark ETL pipeline processing up to 24M records with 200+ parameters in under 60 minutes — ingesting 1B+ Canadian health data points (registry, diagnosis, pharma) from every hospital in the country. Serve federal/provincial government and NPO clients, lead cross-functional engineering team across the full SDLC, manage client relationships, and enforce PIPEDA and provincial health privacy compliance.
Built PySpark-based credit risk management platform handling Apple Card, Walmart Card, and GM Card portfolios at 1M transactions/hour. Resolved P-0 production incidents saving $10M+ in risk exposure. Built Python test automation framework reducing end-to-end test time by 60%.
Architected and led a full waterfall-to-agile transformation across 42 company websites. Designed Python/FastAPI/Docker/AWS microservice handling 100K requests/hour. Delivered $5M/year in annual cost savings through AEM virtualisation. Built WCAG accessibility compliance tooling across the full digital estate.
Built and maintained Azure cloud infrastructure of 1,800+ servers (Windows & Linux) with 99.99% uptime SLA. Developed real-time Power BI dashboards for infrastructure monitoring. Automated CI/CD pipeline with Python and Docker, reducing debugging time by 45 minutes.
Designed real-time airport analytical system integrating LiDAR and Camera hardware using Python and reactive programming. Led Python 2→3 migration of large-scale airport systems. Transformed monolithic legacy architecture into cloud-based microservices on Azure.
Published 3 peer-reviewed papers (89+ citations, NSERC-funded). Built 20-node Elasticsearch cluster searching 330M tweets/second for real-time public health analytics. Developed Random Forest NLP model achieving 93.4% accuracy for population-level health classification.
Built asynchronous chatbot analytics API handling thousands of requests/second. Developed 5+ real-time AWS dashboards for semantic analysis and topic extraction. Designed clustering algorithm for chat-based decision support.
Built real-time data analysis system for product cost and logistics using SAP and Python. Developed time-series sales forecasting model achieving 71% prediction efficiency. Designed ETL and report generation pipeline for sales, cost, and inventory data.
What colleagues say
Neel stands out as a profoundly skilled Python developer whose proficiency in handling large-scale ETL processes has consistently enhanced our data management capabilities.
His last assignment was very challenging, but he overcame the obstacles and achieved the impossible. I highly recommend Neel for any future opportunities.
It's rare that you come across standout talent like Neel. He has a unique talent to think outside the box and grasp concepts that most find difficult.
He is one of the most dedicated professionals I worked with — always staying patient and focused on resolving the issue, even during late nights and weekends.
Neel's ability to juggle multiple projects is unlike any I've seen before and made a dramatic difference in the productivity level of our team.
Education
Beyond Work
Cycling
Ottawa's cycling paths are underrated. Long rides clear the head after a week of debugging PySpark DAGs.
Walking
Walking is where problems get solved. Some of the best architecture decisions happen away from the screen.
Gym
Consistency at the gym mirrors consistency in engineering. Show up, do the work, trust the compound effect.