Experience

Timeline of ownership, platform delivery, and leadership

Role-by-role execution story across multi-cloud, data engineering, DevOps, and AI delivery programs.

Experience Snapshot

High-ownership roles spanning delivery leadership and platform engineering.

Roles

Bullet Points

Impact Metrics

Practice Lead - Multi Cloud Managed Services & Data Engineering

Confidential

July 2025 - PresentActive

18+ engineers200+ TB60%5+ TB/day100+ DAGs99.9%55%40%

What I led / owned

• Led cloud-to-cloud replatforming programs (AWS to Azure, AWS to GCP, and Azure to AWS) for analytics and DevOps stacks, with zero critical data-loss incidents during migration windows.
• Partnered with data science teams to operationalize ML/GenAI on SageMaker, Azure ML, and Vertex AI with feature pipelines, model deployment, and monitoring.
• Mentored and performance-managed 8+ direct engineers and technical leads, building a strong internal hiring and capability-development pipeline.

How I built it

• Built the Multi-Cloud Data & AI Engineering function from 0 to 8 engineers in the first two quarters and scaled to 18+ engineers across platform, data, and reliability tracks.
• Architected production lakehouse platforms using S3 + Apache Iceberg + Glue Catalog + EMR + Athena, and mapped equivalent patterns on ADLS Gen2/Synapse and GCS/BigQuery.
• Designed and delivered 40+ Spark/PySpark pipelines across EMR, Glue, Dataproc, and Dataflow, processing 5+ TB/day with 60% faster batch completion.
• Implemented orchestration standards across MWAA (Airflow), Azure Data Factory, and Cloud Composer, governing 100+ DAGs/pipelines with SLA-aware alerting.
• Automated landing zones and environment provisioning with Terraform and GitLab CI/CD for VPC/VNet, EKS/AKS/GKE, IAM/RBAC, networking, and policy baselines.
• Implemented data quality gates with Great Expectations and custom PySpark checks, maintaining 99.9% data accuracy SLAs across business-critical datasets.
• Built real-time ingestion and event pipelines using Kinesis/MSK, Event Hubs, and Pub/Sub/Kafka for low-latency analytics and operational AI use cases.

Impact

• Built the Multi-Cloud Data & AI Engineering function from 0 to 8 engineers in the first two quarters and scaled to 18+ engineers across platform, data, and reliability tracks.
• Owned 12+ migration waves from on-prem (Oracle, SQL Server, Hadoop, legacy ETL) to AWS/Azure/GCP, migrating 200+ TB and 1,000+ production jobs with controlled cutovers.
• Designed and delivered 40+ Spark/PySpark pipelines across EMR, Glue, Dataproc, and Dataflow, processing 5+ TB/day with 60% faster batch completion.
• Implemented orchestration standards across MWAA (Airflow), Azure Data Factory, and Cloud Composer, governing 100+ DAGs/pipelines with SLA-aware alerting.
• Implemented data quality gates with Great Expectations and custom PySpark checks, maintaining 99.9% data accuracy SLAs across business-critical datasets.
• Introduced DataOps engineering practices (unit/integration/E2E tests, contract tests, release templates), improving pipeline reliability and reducing production incidents by 55%.
• Executed FinOps optimization plans (spot compute, autoscaling, tiered storage, query tuning), reducing monthly platform costs by 40%.

Engineering Manager, DevOps & Cloud Services

Quantium Analytics Pvt Ltd

July 2021 - July 2025

10+ engineers35%50%10+ TB/day50+ DAGs30%500+ users99.99%2 hours20 minutesunder 20 minutes25%60%22%

What I led / owned

• Defined and executed enterprise DevOps, cloud, and automation strategy to modernize SaaS delivery and platform reliability.
• Coached platform and application teams on Agile, DevOps, and reliability practices in retail/e-commerce programs.
• Led enterprise AI programs across retail and fintech, integrating Claude/OpenAI-powered assistants to reduce manual query handling by 60%.
• Partnered with MLOps teams to deploy AI services on AKS/EKS/GKE with CI/CD model release, monitoring, and rollback strategies.
• Led innovation sprints on secure and responsible AI adoption, including guardrails, evaluation pipelines, and compliance-first deployment patterns.

How I built it

• Designed and operated 70+ CI/CD pipelines using Jenkins, GitHub Actions, Azure DevOps, and ArgoCD with standardized quality/security gates.
• Built cloud-native data engineering foundations on AWS Glue, EMR, and S3 supporting 10+ TB/day for retail and e-commerce analytics workloads.
• Operationalized Airflow on MWAA and hybrid schedulers, managing 50+ DAGs with automated retry, dependency controls, and incident routing.
• Implemented S3 data lake zoning, lifecycle, and archival controls to improve governance and reduce storage cost by 30%.
• Enabled self-service analytics via Glue Catalog + Athena with 200+ governed tables and automated schema/partition management.
• Architected GitOps-based delivery with ArgoCD and Helm for predictable environment promotion and rollback safety.
• Implemented full-stack observability (Prometheus, Grafana, Datadog, Splunk) with actionable alerting and runbook automation.
• Implemented disaster recovery and high-availability patterns across compute, data, and artifact systems to improve resilience.
• Built a developer-first operating model with automation playbooks, diagnostics, and release guidance.
• Architected RAG pipelines using LangChain + vector databases over product and transactional data, improving recommendation conversion by 22%.
• Built multi-step orchestration pipelines using LangChain and tool-calling APIs for contextual query understanding and escalation handling.
• Implemented MCP-based AI services to inject policy, compliance, and business context at runtime, reducing hallucination risk in regulated workloads.

Impact

• Started with a lean DevOps pod and scaled it to 10+ engineers supporting multi-cloud platforms, release engineering, and SRE operations.
• Reduced deployment lead time by 35% and rollback time by 50% through pipeline optimization, release templates, and progressive delivery controls.
• Built cloud-native data engineering foundations on AWS Glue, EMR, and S3 supporting 10+ TB/day for retail and e-commerce analytics workloads.
• Operationalized Airflow on MWAA and hybrid schedulers, managing 50+ DAGs with automated retry, dependency controls, and incident routing.
• Implemented S3 data lake zoning, lifecycle, and archival controls to improve governance and reduce storage cost by 30%.
• Established dimensional modeling standards (Star/Snowflake/SCD2) that improved BI query performance for 500+ users.
• Maintained 99.99% uptime using Prometheus, Azure Monitor, Grafana, and SLO/SLA dashboards with proactive remediation automation.
• Drove MTTR improvements from 2 hours to under 20 minutes by addressing build failures, resource contention, and deployment bottlenecks.
• Integrated Artifactory and dependency caching into build systems, improving artifact retrieval time by 25%.
• Delivered stable build automation platforms with 99.99% uptime across Azure and GCP environments.
• Led enterprise AI programs across retail and fintech, integrating Claude/OpenAI-powered assistants to reduce manual query handling by 60%.
• Architected RAG pipelines using LangChain + vector databases over product and transactional data, improving recommendation conversion by 22%.
• Engineered agentic AI workflows with planner, retriever, verifier, and action agents, improving automation accuracy by 35% in data workflows.

DevOps Engineering Manager

Tata Consultancy Services Limited | Client - Apple

Jan 2010 - June 2021

40%99.99%

What I led / owned

• Led large-scale DevOps transformation programs for global enterprises, improving delivery velocity, reliability, and cloud adoption outcomes.
• Built and scaled engineering teams from 0 to 30+ DevOps practitioners across release engineering, platform operations, and tooling.
• Partnered with PMs, SDEs, and TPMs to deliver quarterly releases on time across 7+ globally distributed engineering teams.

How I built it

• Built and scaled engineering teams from 0 to 30+ DevOps practitioners across release engineering, platform operations, and tooling.
• Automated provisioning and configuration management with Terraform and Ansible, reducing manual build effort and configuration drift.
• Implemented Kubernetes and GitOps operating models across EKS/AKS and private/on-prem clusters for scalable deployments.
• Developed CI/CD frameworks with Jenkins, GitHub Actions, ArgoCD, and Helm for repeatable, auditable software delivery.

Impact

• Improved incident response time by 40% through proactive monitoring, SRE alerting standards, and automation-assisted triage.
• Delivered build automation services with 99.99% uptime across AWS and on-prem hybrid environments.

Linux Engineer

Tata Consultancy Services Limited | Client - Electronics Art

Nov 2007 - Dec 2009

What I led / owned

• Managed and optimized RHEL and OEL-based enterprise infrastructure, improving security and performance.

How I built it

• Automated backup & recovery processes, ensuring disaster resilience.

Impact

• Provided Linux system administration and on-call support for critical production environments.
• Managed and optimized RHEL and OEL-based enterprise infrastructure, improving security and performance.
• Automated backup & recovery processes, ensuring disaster resilience.
• Configured and maintained Apache virtual hosting, LVM, and security controls.