Experience
Timeline of ownership, platform delivery, and leadership
Role-by-role execution story across multi-cloud, data engineering, DevOps, and AI delivery programs.

Experience Snapshot
High-ownership roles spanning delivery leadership and platform engineering.
Roles
4
Bullet Points
72
Impact Metrics
24
Practice Lead - Multi Cloud Managed Services & Data Engineering
Confidential
July 2025 - PresentActive
18+ engineers200+ TB60%5+ TB/day100+ DAGs99.9%55%40%
What I led / owned
- • Led cloud-to-cloud replatforming programs (AWS to Azure, AWS to GCP, and Azure to AWS) for analytics and DevOps stacks, with zero critical data-loss incidents during migration windows.
- • Partnered with data science teams to operationalize ML/GenAI on SageMaker, Azure ML, and Vertex AI with feature pipelines, model deployment, and monitoring.
- • Mentored and performance-managed 8+ direct engineers and technical leads, building a strong internal hiring and capability-development pipeline.
How I built it
- • Built the Multi-Cloud Data & AI Engineering function from 0 to 8 engineers in the first two quarters and scaled to 18+ engineers across platform, data, and reliability tracks.
- • Architected production lakehouse platforms using S3 + Apache Iceberg + Glue Catalog + EMR + Athena, and mapped equivalent patterns on ADLS Gen2/Synapse and GCS/BigQuery.
- • Designed and delivered 40+ Spark/PySpark pipelines across EMR, Glue, Dataproc, and Dataflow, processing 5+ TB/day with 60% faster batch completion.
- • Implemented orchestration standards across MWAA (Airflow), Azure Data Factory, and Cloud Composer, governing 100+ DAGs/pipelines with SLA-aware alerting.
- • Automated landing zones and environment provisioning with Terraform and GitLab CI/CD for VPC/VNet, EKS/AKS/GKE, IAM/RBAC, networking, and policy baselines.
- • Implemented data quality gates with Great Expectations and custom PySpark checks, maintaining 99.9% data accuracy SLAs across business-critical datasets.
- • Built real-time ingestion and event pipelines using Kinesis/MSK, Event Hubs, and Pub/Sub/Kafka for low-latency analytics and operational AI use cases.
Impact
- • Built the Multi-Cloud Data & AI Engineering function from 0 to 8 engineers in the first two quarters and scaled to 18+ engineers across platform, data, and reliability tracks.
- • Owned 12+ migration waves from on-prem (Oracle, SQL Server, Hadoop, legacy ETL) to AWS/Azure/GCP, migrating 200+ TB and 1,000+ production jobs with controlled cutovers.
- • Designed and delivered 40+ Spark/PySpark pipelines across EMR, Glue, Dataproc, and Dataflow, processing 5+ TB/day with 60% faster batch completion.
- • Implemented orchestration standards across MWAA (Airflow), Azure Data Factory, and Cloud Composer, governing 100+ DAGs/pipelines with SLA-aware alerting.
- • Implemented data quality gates with Great Expectations and custom PySpark checks, maintaining 99.9% data accuracy SLAs across business-critical datasets.
- • Introduced DataOps engineering practices (unit/integration/E2E tests, contract tests, release templates), improving pipeline reliability and reducing production incidents by 55%.
- • Executed FinOps optimization plans (spot compute, autoscaling, tiered storage, query tuning), reducing monthly platform costs by 40%.
Engineering Manager, DevOps & Cloud Services
Quantium Analytics Pvt Ltd
July 2021 - July 2025
10+ engineers35%50%10+ TB/day50+ DAGs30%500+ users99.99%2 hours20 minutesunder 20 minutes25%60%22%
What I led / owned
- • Defined and executed enterprise DevOps, cloud, and automation strategy to modernize SaaS delivery and platform reliability.
- • Coached platform and application teams on Agile, DevOps, and reliability practices in retail/e-commerce programs.
- • Led enterprise AI programs across retail and fintech, integrating Claude/OpenAI-powered assistants to reduce manual query handling by 60%.
- • Partnered with MLOps teams to deploy AI services on AKS/EKS/GKE with CI/CD model release, monitoring, and rollback strategies.
- • Led innovation sprints on secure and responsible AI adoption, including guardrails, evaluation pipelines, and compliance-first deployment patterns.
How I built it
- • Designed and operated 70+ CI/CD pipelines using Jenkins, GitHub Actions, Azure DevOps, and ArgoCD with standardized quality/security gates.
- • Built cloud-native data engineering foundations on AWS Glue, EMR, and S3 supporting 10+ TB/day for retail and e-commerce analytics workloads.
- • Operationalized Airflow on MWAA and hybrid schedulers, managing 50+ DAGs with automated retry, dependency controls, and incident routing.
- • Implemented S3 data lake zoning, lifecycle, and archival controls to improve governance and reduce storage cost by 30%.
- • Enabled self-service analytics via Glue Catalog + Athena with 200+ governed tables and automated schema/partition management.
- • Architected GitOps-based delivery with ArgoCD and Helm for predictable environment promotion and rollback safety.
- • Implemented full-stack observability (Prometheus, Grafana, Datadog, Splunk) with actionable alerting and runbook automation.
- • Implemented disaster recovery and high-availability patterns across compute, data, and artifact systems to improve resilience.
- • Built a developer-first operating model with automation playbooks, diagnostics, and release guidance.
- • Architected RAG pipelines using LangChain + vector databases over product and transactional data, improving recommendation conversion by 22%.
- • Built multi-step orchestration pipelines using LangChain and tool-calling APIs for contextual query understanding and escalation handling.
- • Implemented MCP-based AI services to inject policy, compliance, and business context at runtime, reducing hallucination risk in regulated workloads.
Impact
- • Started with a lean DevOps pod and scaled it to 10+ engineers supporting multi-cloud platforms, release engineering, and SRE operations.
- • Reduced deployment lead time by 35% and rollback time by 50% through pipeline optimization, release templates, and progressive delivery controls.
- • Built cloud-native data engineering foundations on AWS Glue, EMR, and S3 supporting 10+ TB/day for retail and e-commerce analytics workloads.
- • Operationalized Airflow on MWAA and hybrid schedulers, managing 50+ DAGs with automated retry, dependency controls, and incident routing.
- • Implemented S3 data lake zoning, lifecycle, and archival controls to improve governance and reduce storage cost by 30%.
- • Established dimensional modeling standards (Star/Snowflake/SCD2) that improved BI query performance for 500+ users.
- • Maintained 99.99% uptime using Prometheus, Azure Monitor, Grafana, and SLO/SLA dashboards with proactive remediation automation.
- • Drove MTTR improvements from 2 hours to under 20 minutes by addressing build failures, resource contention, and deployment bottlenecks.
- • Integrated Artifactory and dependency caching into build systems, improving artifact retrieval time by 25%.
- • Delivered stable build automation platforms with 99.99% uptime across Azure and GCP environments.
- • Led enterprise AI programs across retail and fintech, integrating Claude/OpenAI-powered assistants to reduce manual query handling by 60%.
- • Architected RAG pipelines using LangChain + vector databases over product and transactional data, improving recommendation conversion by 22%.
- • Engineered agentic AI workflows with planner, retriever, verifier, and action agents, improving automation accuracy by 35% in data workflows.
DevOps Engineering Manager
Tata Consultancy Services Limited | Client - Apple
Jan 2010 - June 2021
40%99.99%
What I led / owned
- • Led large-scale DevOps transformation programs for global enterprises, improving delivery velocity, reliability, and cloud adoption outcomes.
- • Built and scaled engineering teams from 0 to 30+ DevOps practitioners across release engineering, platform operations, and tooling.
- • Partnered with PMs, SDEs, and TPMs to deliver quarterly releases on time across 7+ globally distributed engineering teams.
How I built it
- • Built and scaled engineering teams from 0 to 30+ DevOps practitioners across release engineering, platform operations, and tooling.
- • Automated provisioning and configuration management with Terraform and Ansible, reducing manual build effort and configuration drift.
- • Implemented Kubernetes and GitOps operating models across EKS/AKS and private/on-prem clusters for scalable deployments.
- • Developed CI/CD frameworks with Jenkins, GitHub Actions, ArgoCD, and Helm for repeatable, auditable software delivery.
Impact
- • Improved incident response time by 40% through proactive monitoring, SRE alerting standards, and automation-assisted triage.
- • Delivered build automation services with 99.99% uptime across AWS and on-prem hybrid environments.
Linux Engineer
Tata Consultancy Services Limited | Client - Electronics Art
Nov 2007 - Dec 2009
What I led / owned
- • Managed and optimized RHEL and OEL-based enterprise infrastructure, improving security and performance.
How I built it
- • Automated backup & recovery processes, ensuring disaster resilience.
Impact
- • Provided Linux system administration and on-call support for critical production environments.
- • Managed and optimized RHEL and OEL-based enterprise infrastructure, improving security and performance.
- • Automated backup & recovery processes, ensuring disaster resilience.
- • Configured and maintained Apache virtual hosting, LVM, and security controls.