Hi, I'm Vinayak Shrishail Anadinni
Data Science Professional | Master's in Data Science at Illinois Institute of Technology
Building scalable ML pipelines and data-driven solutions. Passionate about transforming data into actionable insights.

About Me
Currently pursuing my Master's in Data Science at Illinois Institute of Technology, I focus on bridging the gap between raw data and measurable business value. I don't just analyze data I design scalable ML pipelines that transform complex information into actionable strategies.
With a strong foundation in real-time data processing and production-ready systems, I’m specializing in MLOps to ensure models are not just built, but continuously monitored, scalable, and reliable in real-world environments.
Education
Master of Data Science
Jan '24 – Present
GPA: 3.72/4.0
Bachelor of Engineering
Aug '17 - Aug '21
GPA: 3.4/4.0
Technical Skills
Core Data Analytics & Visualization
Data Engineering & Data Platforms
Machine Learning & Generative AI
MLOps, Cloud & Secure Systems
Professional Experience
Graduate Teaching Assistant
Illinois Institute of Technology
- Orchestrated academic support for 100+ students across Coursera-delivered database courses, triaging and resolving 150+ technical tickets per semester via Salesforce and Asana workflows.
- Reduced average student resolution time by 45% through systematic ticket categorization, priority-based routing, and clear escalation paths.
- Applied and taught data modeling best practices including normalized schema design (1NF–3NF), indexing strategies, stored procedure architecture, and query optimization across 30+ database projects.
- Documented specifications and design decisions that improved project delivery rates by 35% and standardized grading and feedback workflows.
- Mentored 25–30 students on end-to-end project tech-stack selection, guiding ORM choices, partitioning strategies, and multi-table join optimization while eliminating unresolved blockers through structured review cycles.
Data Scientist
Labelmaster
- Increased CRM record completeness from 52% to 89% across 440K records, enabling a 30% improvement in account executive targeting accuracy for 50+ Chicago-based sales users.
- Executed XGBoost imputation and Sentence-BERT feature engineering to replace manual audit processes with an automated data quality scoring pipeline.
- Architected a holdout-based A/B testing framework with 95% statistical confidence gates to prevent underperforming model versions from reaching 50+ active sales users.
- Improved lead-scoring model precision by 15% across 270K CRM records while cutting inference time by 2x and memory usage by 40% via GPU-batched Sentence-BERT pipelines stored in a vector index.
- Eliminated 560+ hours per year of manual data review by tuning XGBoost classifiers with Optuna and SMOTE, achieving a 0.89 F1 score with drift detection and model monitoring for production retraining.
Data Consultant
Tata Consultancy Services (TCS)
- Delivered a credit risk segmentation report for a Canadian regional Equifax portfolio by training an XGBoost classifier on masked historical credit records with engineered features for payment velocity, utilization ratio, delinquency recency, and balance trajectory.
- Surfaced three-tier risk classifications and default probability scores in an executive Looker Studio dashboard to inform portfolio strategy.
- Accelerated cross-functional discrepancy detection by 30% and reduced reporting cycle time by 20% across Finance, Marketing, BI, and Sales teams by productionizing MigrationWatch dashboards in Power BI and Looker Studio with unified data definitions and lineage documentation.
- Designed and owned the MF-GCP Validation Engine, verifying 1,200+ field mappings across Equifax Mainframe-to-GCP migration batches and improving correction match rates from 60% to 99.6% while enforcing PII compliance with Virtru encryption.
- Cut validation runtime by 67%, cloud compute costs by 35%, and manual intervention by 80% across 1TB+ daily files by re-engineering 50+ Python validation scripts into partitioned and clustered BigQuery-native SQL with modular stored procedures and automated anomaly alerting.
- Compressed quarterly compliance audits from 5 days to 2 days and improved data quality degradation detection by 30% by implementing a production drift monitoring pipeline using Apache Airflow and Google Composer to surface Population Stability Index alerts and metadata lineage signals.
Projects
Filter by status and role to browse projects.
Data Scientist
Hallucination Hunter — RAG Monitoring & LLMOps
Since Dec '25
Building a Retrieval-Augmented Generation (RAG) chatbot and a monitoring suite to track hallucinations and response quality.
- Orchestrate RAG using LangChain or LlamaIndex over PDFs/technical docs.
- Track response quality and potential hallucinations using Arize Phoenix or MLflow.
- CI/CD: vector DB updates trigger regression-style evals to ensure the bot doesn't get worse.
Predictive Modeling for Customer Churn in Telecom
Mar '25 – Apr '25
Built a churn prediction model using H2O.ai and TensorFlow, achieving 85% AUC and 82% accuracy, enabling proactive customer retention strategies.
- Deployed on AWS SageMaker with automated MLOps workflows (monitoring, retraining, performance tracking) to ensure scalable, production-ready deployment across large datasets.
MLOps Engineer
Reproducible Data Pipeline with DVC & MLflow
Dec '25
Engineered a fault-tolerant machine learning pipeline ensuring 100% data lineage and model consistency across environments.
- Orchestrated data and model versioning using DVC to eliminate "works on my machine" issues.
- Implemented MLflow tracking to log hyperparameters and metrics for optimized Random Forest performance.
- Decoupled pipeline stages (preprocess → train → evaluate) for modular scalability.
Wine Quality Prediction — End-to-End MLOps on AWS
Dec '25
Architected a collaborative MLflow tracking server on AWS to centralize model management and experiment logging.
- Deployed a remote tracking server on AWS EC2 backed by RDS (PostgreSQL) for robust metadata management.
- Integrated AWS S3 as an artifact store to securely version control model binaries and plots.
- Centralized experiment logging to enable team-based model iteration and comparison.
Data Engineer
Real-Time Data Streaming Platform
Planned
Create a real-time streaming analytics platform that ingests events, performs windowed processing, and powers live dashboards.
- Kafka for high-throughput message queuing and ingestion.
- Spark Streaming for windowing, aggregation, and near real-time processing.
- Elasticsearch + Kibana for live analytics and visualization.
Scalable Data Lake Architecture
Planned
Design a scalable data lake to store structured and unstructured data with zone-based organization and governance.
- Organize zones (raw, cleaned, curated) with partitioning strategies for TB-scale storage.
- Add metadata catalog for discoverability and data lineage.
- Enable querying via Presto/Athena and implement access controls.
Data Analyst
CTA Transit Analytics & Bus Management System
Nov '24 - Dec '24
Developed a Streamlit + MySQL application to manage and visualize transit data, supporting route tracking and passenger flows.
- Designed a relational schema and implemented analytics queries for trips and routes.
- Optimized SQL queries and built scalable UI components for faster exploration.
Sales Insights Dashboard: Tableau & MySQL Project
Oct '24 - Nov '24
Built an interactive Tableau dashboard analyzing 150K+ sales transactions, visualizing revenue trends, top products, and customers.
- Integrated with MySQL for real-time updates and enabled one-click regional filtering, reducing manual reporting effort and accelerating business decisions.