SHUBHAM JAIN
Senior Data Scientist
Summary
A highly skilled and accomplished Senior Data Scientist with 7 years of extensive experience. My expertise lies in leveraging techniques such as Neural Networks and Machine Learning to build predictive models and generate valuable insights. I thrive on solving complex data problems and creating actionable strategies to enhance business performance.
Experience
Lead Data Scientist
ALBIS AI
06/2024 - Present
- Developed multiple domain-specific chatbots (Medical, City Analysis, Event Management) utilizing RAG for multi-source data retrieval and LoRA-based fine-tuning on Azure.
- Researched Multiple Image Generation Models to develop 2D representations of electric and mechanical circuits.
Data Science Manager
Makro Pro
03/2023 - 06/2024
- Led data science initiatives at Makro PRO, Thailand's #1 B2B grocery e-commerce platform with 39.5% market share in 2024, processing over 20,000 orders daily with 500K+ average interactions.
- Created a comprehensive recommendation system serving both B2B and B2C customers, utilizing ALS and Bert4Rec in PyTorch, achieving a 40% improvement in carousel click-through rates.
- Developed intelligent target campaigning system using recommendation algorithms and churn prediction models, tripling participation rates and achieving a 20% win-back of churned customers using SQL and advanced analytics.
- Designed and implemented a lifecycle segmentation model leveraging Transformers, PySpark, and PyTorch to classify customers into segments based on transaction patterns and textual data, facilitating tailored business strategies.
Data Science Manager
Capri Global Capital
08/2022 - 03/2023
- Lead a team to develop a credit risk analysis system utilizing models such as GBM, Random Forest Classifier (RFC), and Neural Networks with TensorFlow, enabling accurate risk identification and streamlining the process pipeline.
- Contributed to automating document verification by implementing text and photo matching using DeepFace and OCR models in TensorFlow.
Senior Data Scientist
Fresh Gravity
08/2021 - 08/2022
- Collaborated on clinical data extraction and conducted various statistical analyses, including Null Detection and Anomaly Detection, to define exclusion criteria using Scala.
- Collaborated on the Medical Overdose project, performing side-effect detection by extracting data through NER (Named Entity Recognition) using Spacy and BERT models.
- Tracked user actions and website usage activities to identify and suggest prospective clients for the business, increasing retention rate by 40% using Pandas, PyTorch.
Data Scientist
Vidooly Media
07/2018 - 08/2021
- Part of the core analytics and models team handling comprehensive content analysis including Brand Safety (NSFW Prevention, Violence, Adult Scene prevention in meta and Video), Sentiment Analysis of comments, and Demography impact assessment.
- Developed and deployed content safety models using RESNET, DenseNet, and LSTM for video thumbnail and metadata analysis, potentially saving approximately $5M-12M in brand safety violations.
- Co-operated in a custom category prediction model based on video text details for different social media platforms using LSTM in PyTorch.
- Accomplished tag generation and tag suggestions using Keyword Matching, increasing content viewership by 20% using NER detection, RAKE, TestRank, Noun Chunking, Knowledge Graph, etc.
- Worked on a weapon identification model using YOLO, RetinaNet for enhanced content safety.
- Built a model to predict the tentative ad and organic views ratio of a YouTube video using GBM, SVC for better monetization strategies.
Education
Bachelor of Technology
Jaypee University of Engineering and Technology
08/2014 - 05/2018
Skills
Technologies
- Data Science
- NLP
- Computer Vision
- Neural Network
- Recommender System
- Machine Learning
- Big Data
- Statistical Analysis
- Clustering
- Regression
- Gen AI
Language and Tools
- Python
- SQL
- C++
- GCP
- AWS
- AWS Sagemaker
- Auto ML
- Airflow
Algorithms
- RoBERTa
- T5
- GPT
- RESNET
- CNN
- LSTM
- SVM
- LM
- Random Forest
- BERT
- XGB
- ALS
Libraries
- Tensorflow
- PyTorch
- HuggingFace
- LangChain
- Stable Diffusion
- Scikit-Learn
- PySpark
- Pandas
- NLTK
- MatplotLib
- Spacy
- Scrapy
Projects
Website Answering Bot
- Developed a bot that answers user queries by extracting data from specified URLs.
- Bot provides responses along with citations referencing the source documents. Built using OpenAI, RAG, Chroma, and LangChain.
Multi-Lingual Sentiment Classification
- Detected sentiments for comments over 100+ languages using BERT, XLM models trained only on English comments in PyTorch.
Jigsaw Toxicity Classification
- Performed Toxicity Classification based on race, gender, disability discrimination, and abusive language using FastText Embedding over Bi-directional LSTM in PyTorch.
Question Paraphraser
- Focused on paraphrasing questions into multiple questions using T5 Model trained on the Quora question pair dataset in TensorFlow.
Emotion Based Counselor Bot
- Created an AI-based Chat Bot mimicking human characteristics and emotion interpretation skills.
- Generated responses based on the user's emotional engagement.