Results-driven Data Engineer with over 4 years of experience designing and implementing scalable data pipelines across banking and healthcare domains. Proficient in Python, PySpark, and SQL for processing large-scale structured and semi-structured data, with hands-on expertise in Apache Spark, Kafka, Airflow, and Flume. Skilled in building end-to-end ETL workflows using AWS Glue, Azure Data Factory, and Databricks, and managing data lakes on AWS S3 and Azure Data Lake Storage. Experienced in modeling data using Star and Snowflake schemas, optimizing performance in Snowflake, Redshift, Synapse, PostgreSQL, and SQL Server. Adept at real-time data processing using Kafka Streams and Spark Structured Streaming for use cases such as fraud detection. Strong knowledge of data quality checks with Great Expectations, metadata management via Apache Atlas and Glue Data Catalog, and CI/CD pipelines with GitHub Actions and Jenkins. Developed REST APIs using Flask and FastAPI, and deployed scalable jobs on EMR and Kubernetes. Ensures enterprise-grade data governance with HIPAA/SOX compliance, robust monitoring using CloudWatch and Datadog, and collaboration with architects, analysts, and data scientists on ML feature pipelines. Passionate about clean architecture, reusable design, and mentoring peers in data engineering best practices
Programming Languages:Python, SQL, Scala (basic), Shell Scripting
Big Data Technologies:Apache Spark, PySpark, Hadoop, Delta Lake, Spark Structured Streaming
Data Integration:Apache Airflow, AWS Glue, Azure Data Factory, Informatica (basic)
Cloud Platforms:AWS (S3, Redshift, Glue, EMR, KMS, Athena), Azure (ADLS, Synapse, Key Vault, ADF)
Data Warehouses:Snowflake, Redshift, Azure Synapse Analytics, SQL Server, PostgreSQL, Oracle
Data Modeling:Star Schema, Snowflake Schema, Dimensional Modeling
Streaming & Messaging:Apache Kafka, Kafka Streams, Flume
Data Catalog/Lineage:AWS Glue Catalog, Apache Atlas, Azure Purview
Workflow Orchestration:Apache Airflow, Oozie
DevOps & CI/CD:Git, GitHub Actions, Bitbucket, Jenkins, Azure DevOps
Monitoring & Logging:CloudWatch, Datadog, Prometheus, Azure Monitor, Log Analytics
Testing & Quality:Great Expectations, PyTest, UnitTest, Data Validation Scripts
Visualization Tools:Power BI, Tableau (for data outputs only)
Security & Compliance:IAM, KMS, HIPAA, SOX, Role-Based Access Control (RBAC)
File Formats:Parquet, ORC, CSV, JSON, Avro, XML
Documentation & Tools:Confluence, JIRA, Postman, Swagger, MS Excel
Predicting House Prices Using Machine Learning (Python, ML, Scikit-Learn) Jan 2020 - Jun 2020, Achieved an R² score of 0.92, meaning the model explained 92% of the variance in house prices. This high level of accuracy allowed real estate agents to make more data-driven pricing decisions, leading to more competitive pricing strategies and faster property sales., Reduced prediction error by 25% compared to traditional manual methods, enabling real estate professionals to avoid overpricing or underpricing properties. This resulted in a 15% improvement in the average time to sell properties, optimizing the sales cycle., Real-time Analytics and Optimization of E-commerce Platform using Big Data (Spark, Hive, Power Bi) Jan 2023 - May 2023, Reduced data processing time by 40% through the implementation of Apache Spark for distributed computing, allowing real-time user behavior analysis and quick decision-making across marketing teams., Increased e-commerce sales by 15% within 6 months by deploying personalized product recommendations based on real-time analysis of user activity, boosting conversion rates and customer retention., Economic Forecasting and Policy Impact Analysis for Consumer Spending Behavior Aug 2022 - Dec 2022, Improved model prediction accuracy by 20% through model optimization and the incorporation of external factors such as government stimulus programs and interest rate changes, resulting in a more reliable consumer spending forecast., Quantified the potential impact of monetary policy changes on consumer spending, providing actionable insights for policymakers. The analysis showed that a 1% reduction in interest rates could lead to a 3% increase in consumer spending, influencing future fiscal policy decisions., Optimizing Sales Forecasting & Market Analysis for Conagra (Python, PySpark, Tableau) Jan 2024 - May 2024, Developed and deployed a linear regression model in Python to predict sales trends, achieving a 35% improvement in forecast accuracy over previous methods., Leveraged Tableau for real-time data visualization, enabling a 25% improvement in strategic decision-making through better consumer trend analysis.