Accomplished Data Scientist with a proven track record at Bermuda Monetary Authority, specializing in machine learning and data pipelines. Enhanced model accuracy by 15% through innovative forecasting techniques. Skilled in data visualization and agile project management, delivering impactful insights that drive financial decision-making and regulatory compliance.
Overview
5
5
years of professional experience
1
1
Certification
Work History
Data Scientist
Bermuda Monetary Authority
06.2024 - Current
Fine-tuned BERT-based models for macroeconomic trend analysis, leveraging domain-specific pretraining to improve text-based financial risk assessment
Enhanced model interpretability and reduced forecasting errors, leading to more reliable insights for market stability analysis
Developed an end-to-end ETL pipeline using DLT Hub and Apache Spark to efficiently process macroeconomic data from the Nasdaq API, ensuring high-throughput data ingestion, transformation, and storage for regulatory risk assessments
Designed Power BI dashboards integrated with LSTM-based time-series forecasting models, enabling dynamic visualization of macroeconomic indicators and delivering real-time predictive insights to enhance financial decision-making
Built and fine-tuned a BERT-based document classification pipeline using BERT embeddings, TF-IDF vectorization, and RoBERTa, automating regulatory compliance document processing
Achieved a 40% reduction in manual review time, a 12% increase in classification F1-score, and significantly improved document retrieval efficiency for audit workflows
Implemented A/B testing using Double Machine Learning and Bayesian Structural Time Series, combined with uplift modeling (XGBoost), to optimize macroeconomic forecasting
Improved predictive accuracy by 15%, enabling better detection of causal relationships between economic indicators and financial market stability, driving data-driven policy interventions
Developed an AI-driven fraud detection system to mitigate financial crime risks in regulatory markets, identifying anomalous transaction patterns and high-risk activities using Isolation Forest
Optimized hyperparameters (contamination factor, estimators, and max samples) to improve fraud detection precision and reduce false positives
Implemented cryptographic security measures by integrating RSA encryption (2048-bit) with anomaly detection pipelines, ensuring secure transaction flagging and tamper-proof audit trails to enhance regulatory compliance and fraud prevention
Applied feature selection techniques (RFE, PCA, correlation analysis) to refine fraud detection models, reducing computational overhead and improving risk assessment efficiency for financial market monitoring
Strengthened financial risk oversight by embedding fraud detection insights into market surveillance frameworks, enabling proactive intervention strategies to mitigate fraudulent activities in macroeconomic data streams
Data Scientist Intern
Boston University
05.2023 - 06.2023
Implemented neural style transfer using VGG19 pre-trained on ImageNet, extracting multi-layer feature representations to separately compute content loss (Mean Squared Error) and style loss (Gram Matrix-based optimization)
Optimized model performance by leveraging L-BFGS optimizer for efficient convergence and fine-tuning content-to-style weight ratios, reducing artifacts and enhancing stylization quality
Pre-processed and normalized images using OpenCV and NumPy, applying per-channel mean subtraction and re-scaling to ensure compatibility with the neural network
Integrated dynamic hyperparameter tuning, enabling users to adjust iterations, learning rate, and regularization weights (total variation loss) to balance stylization detail and computational efficiency
Data Scientist Intern
J.B. Boda Insurance & Reinsurance Brokers
06.2022 - 11.2022
Enhanced underwriting risk assessment by leveraging Generalized Linear Models (GLMs) and Cox Proportional Hazards Models to analyze insurance policies
Applied logistic regression for claim probability estimation and decision trees for rule-based risk segmentation, identifying key risk factors to optimize policy structuring and improve loss ratio management through Monte Carlo simulations
Developed high-performance data extraction scripts utilizing vectorized operations in Pandas and NumPy, optimizing structured and unstructured claims data processing
Implemented parallelized batch processing and memory-efficient data handling to enhance data throughput and computational efficiency
Applied Isolation Forest for anomaly detection in fraudulent claims and DBSCAN for clustering claim patterns, improving predictive model accuracy and reducing claim settlement times by 30% through feature engineering and outlier detection techniques
Developed a logistic regression model achieving 85% accuracy for automated insurance claim approvals, leveraging L1 regularization for feature selection and SMOTE for handling class imbalance
This improved prediction reliability, reduced manual review time, and enhanced underwriting efficiency by streamlining risk assessment and decision-making
Conducted a 5-year time series analysis using ARIMA and Prophet, incorporating trend decomposition and seasonality adjustments to enhance actuarial risk modeling
Improved forecasting accuracy by 12%, enabling more precise underwriting decisions, better capital reserve planning, and proactive risk mitigation strategies
Data Scientist
Deep Agency
05.2020 - 04.2022
Built a predictive text analytics system with TF-IDF vectorization, dependency parsing, and named entity recognition (NER) for automated inquiry classification
Deployed via a database-integrated API, improving response prioritization, reducing resolution time, and enhancing service efficiency through automated case routing and real-time monitoring
Implemented an anomaly detection framework using Z-score analysis and Isolation Forest for outlier detection and K-Means clustering with anomaly score thresholds to identify suspicious claims and transactions
Integrated real-time monitoring with SQL triggers and rule-based alerts, reducing false positives by 15% and detecting 20% more fraudulent claims
Strengthened fraud prevention measures, minimized unwarranted payouts, and enhanced regulatory compliance in underwriting and claims processing
Developed a predictive text analytics system using TF-IDF vectorization, dependency parsing, and named entity recognition (NER) to classify customer inquiries and detect priority cases
Integrated the system with CRM databases and workflow automation, enabling real-time case routing and sentiment-based prioritization
Deployed via a database-integrated API, reducing resolution time by 25%, improving response efficiency, and enhancing customer satisfaction in policy servicing and claims management
Engineered a high-performance logistic regression model for binary classification of insurance claim approvals, leveraging L2 regularization (Ridge regression) to mitigate overfitting and enhance generalizability
Conducted feature selection using logistic regression coefficients and variance thresholding, isolating critical predictors to improve model interpretability and decision-making transparency in claim evaluations
Accelerated model training and inference by implementing batch processing and NumPy-based vectorized computations, significantly reducing computational overhead and enabling real-time, high-throughput predictions
Integrated model tracking and versioning directly within a Streamlit-based interactive interface, ensuring seamless deployment, real-time monitoring, and reproducibility for continuous model optimization
Delivered a robust, scalable, and production-ready solution that enhances operational efficiency, reduces manual processing delays, and provides a data-driven approach to claim approval decisions
Fine-Tuning Large Language Models - Deeplearning.AI
Neural Networks and Deep Learning - Deeplearning.AI
Python for Data Science, AI & Development - IBM
Machine Learning Specialization - Deeplearning.AI
Publications
Comparative study on Llama 3.2 3B vs. DeepSeek V3, https://medium.com/@vaidya.keyur2/llama-3-2-3b-vs-deepseek-v3-performance-cost-and-use-case-comparison-522e072ad3c9
Implementing different models using NumPy and PyTorch, https://medium.com/@vaidya.keyur2/machine-learning-implementing-different-models-using-numpy-and-pytorch-626dc78e1090
Vice President of Activations & Retention at CelebExperts / Renegade Talent MGMTVice President of Activations & Retention at CelebExperts / Renegade Talent MGMT