Summary
Overview
Work History
Education
Skills
Websites
Certification
Publications
Timeline
Generic

Keyur Vaidya

Boston

Summary

Accomplished Data Scientist with a proven track record at Bermuda Monetary Authority, specializing in machine learning and data pipelines. Enhanced model accuracy by 15% through innovative forecasting techniques. Skilled in data visualization and agile project management, delivering impactful insights that drive financial decision-making and regulatory compliance.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Scientist

Bermuda Monetary Authority
06.2024 - Current
  • Fine-tuned BERT-based models for macroeconomic trend analysis, leveraging domain-specific pretraining to improve text-based financial risk assessment
  • Enhanced model interpretability and reduced forecasting errors, leading to more reliable insights for market stability analysis
  • Developed an end-to-end ETL pipeline using DLT Hub and Apache Spark to efficiently process macroeconomic data from the Nasdaq API, ensuring high-throughput data ingestion, transformation, and storage for regulatory risk assessments
  • Designed Power BI dashboards integrated with LSTM-based time-series forecasting models, enabling dynamic visualization of macroeconomic indicators and delivering real-time predictive insights to enhance financial decision-making
  • Built and fine-tuned a BERT-based document classification pipeline using BERT embeddings, TF-IDF vectorization, and RoBERTa, automating regulatory compliance document processing
  • Achieved a 40% reduction in manual review time, a 12% increase in classification F1-score, and significantly improved document retrieval efficiency for audit workflows
  • Implemented A/B testing using Double Machine Learning and Bayesian Structural Time Series, combined with uplift modeling (XGBoost), to optimize macroeconomic forecasting
  • Improved predictive accuracy by 15%, enabling better detection of causal relationships between economic indicators and financial market stability, driving data-driven policy interventions
  • Developed an AI-driven fraud detection system to mitigate financial crime risks in regulatory markets, identifying anomalous transaction patterns and high-risk activities using Isolation Forest
  • Optimized hyperparameters (contamination factor, estimators, and max samples) to improve fraud detection precision and reduce false positives
  • Implemented cryptographic security measures by integrating RSA encryption (2048-bit) with anomaly detection pipelines, ensuring secure transaction flagging and tamper-proof audit trails to enhance regulatory compliance and fraud prevention
  • Applied feature selection techniques (RFE, PCA, correlation analysis) to refine fraud detection models, reducing computational overhead and improving risk assessment efficiency for financial market monitoring
  • Strengthened financial risk oversight by embedding fraud detection insights into market surveillance frameworks, enabling proactive intervention strategies to mitigate fraudulent activities in macroeconomic data streams

Data Scientist Intern

Boston University
05.2023 - 06.2023
  • Implemented neural style transfer using VGG19 pre-trained on ImageNet, extracting multi-layer feature representations to separately compute content loss (Mean Squared Error) and style loss (Gram Matrix-based optimization)
  • Optimized model performance by leveraging L-BFGS optimizer for efficient convergence and fine-tuning content-to-style weight ratios, reducing artifacts and enhancing stylization quality
  • Pre-processed and normalized images using OpenCV and NumPy, applying per-channel mean subtraction and re-scaling to ensure compatibility with the neural network
  • Integrated dynamic hyperparameter tuning, enabling users to adjust iterations, learning rate, and regularization weights (total variation loss) to balance stylization detail and computational efficiency

Data Scientist Intern

J.B. Boda Insurance & Reinsurance Brokers
06.2022 - 11.2022
  • Enhanced underwriting risk assessment by leveraging Generalized Linear Models (GLMs) and Cox Proportional Hazards Models to analyze insurance policies
  • Applied logistic regression for claim probability estimation and decision trees for rule-based risk segmentation, identifying key risk factors to optimize policy structuring and improve loss ratio management through Monte Carlo simulations
  • Developed high-performance data extraction scripts utilizing vectorized operations in Pandas and NumPy, optimizing structured and unstructured claims data processing
  • Implemented parallelized batch processing and memory-efficient data handling to enhance data throughput and computational efficiency
  • Applied Isolation Forest for anomaly detection in fraudulent claims and DBSCAN for clustering claim patterns, improving predictive model accuracy and reducing claim settlement times by 30% through feature engineering and outlier detection techniques
  • Developed a logistic regression model achieving 85% accuracy for automated insurance claim approvals, leveraging L1 regularization for feature selection and SMOTE for handling class imbalance
  • This improved prediction reliability, reduced manual review time, and enhanced underwriting efficiency by streamlining risk assessment and decision-making
  • Conducted a 5-year time series analysis using ARIMA and Prophet, incorporating trend decomposition and seasonality adjustments to enhance actuarial risk modeling
  • Improved forecasting accuracy by 12%, enabling more precise underwriting decisions, better capital reserve planning, and proactive risk mitigation strategies

Data Scientist

Deep Agency
05.2020 - 04.2022
  • Built a predictive text analytics system with TF-IDF vectorization, dependency parsing, and named entity recognition (NER) for automated inquiry classification
  • Deployed via a database-integrated API, improving response prioritization, reducing resolution time, and enhancing service efficiency through automated case routing and real-time monitoring
  • Implemented an anomaly detection framework using Z-score analysis and Isolation Forest for outlier detection and K-Means clustering with anomaly score thresholds to identify suspicious claims and transactions
  • Integrated real-time monitoring with SQL triggers and rule-based alerts, reducing false positives by 15% and detecting 20% more fraudulent claims
  • Strengthened fraud prevention measures, minimized unwarranted payouts, and enhanced regulatory compliance in underwriting and claims processing
  • Developed a predictive text analytics system using TF-IDF vectorization, dependency parsing, and named entity recognition (NER) to classify customer inquiries and detect priority cases
  • Integrated the system with CRM databases and workflow automation, enabling real-time case routing and sentiment-based prioritization
  • Deployed via a database-integrated API, reducing resolution time by 25%, improving response efficiency, and enhancing customer satisfaction in policy servicing and claims management
  • Engineered a high-performance logistic regression model for binary classification of insurance claim approvals, leveraging L2 regularization (Ridge regression) to mitigate overfitting and enhance generalizability
  • Conducted feature selection using logistic regression coefficients and variance thresholding, isolating critical predictors to improve model interpretability and decision-making transparency in claim evaluations
  • Accelerated model training and inference by implementing batch processing and NumPy-based vectorized computations, significantly reducing computational overhead and enabling real-time, high-throughput predictions
  • Integrated model tracking and versioning directly within a Streamlit-based interactive interface, ensuring seamless deployment, real-time monitoring, and reproducibility for continuous model optimization
  • Delivered a robust, scalable, and production-ready solution that enhances operational efficiency, reduces manual processing delays, and provides a data-driven approach to claim approval decisions

Education

Master's - Applied Data Science

Boston University
05.2024

Bachelor of Science - Statistics

Ramnarian Ruia Autonomous College
04.2022

Skills

  • Machine Learning Algorithms
  • Deep Learning & LLMs
  • Time Series Forecasting
  • Statistical & Probabilistic Modeling
  • Programming Languages
  • Data Preprocessing & Cleaning
  • Data Pipelines & ETL
  • Big Data Technologies
  • Data Visualization
  • Agile & Project Management

Certification

  • Fine-Tuning Large Language Models - Deeplearning.AI
  • Neural Networks and Deep Learning - Deeplearning.AI
  • Python for Data Science, AI & Development - IBM
  • Machine Learning Specialization - Deeplearning.AI

Publications

  • Comparative study on Llama 3.2 3B vs. DeepSeek V3, https://medium.com/@vaidya.keyur2/llama-3-2-3b-vs-deepseek-v3-performance-cost-and-use-case-comparison-522e072ad3c9
  • Implementing different models using NumPy and PyTorch, https://medium.com/@vaidya.keyur2/machine-learning-implementing-different-models-using-numpy-and-pytorch-626dc78e1090

Timeline

Data Scientist

Bermuda Monetary Authority
06.2024 - Current

Data Scientist Intern

Boston University
05.2023 - 06.2023

Data Scientist Intern

J.B. Boda Insurance & Reinsurance Brokers
06.2022 - 11.2022

Data Scientist

Deep Agency
05.2020 - 04.2022

Master's - Applied Data Science

Boston University

Bachelor of Science - Statistics

Ramnarian Ruia Autonomous College
Keyur Vaidya