Hi, I'm Vinayak Shrishail Anadinni

Data Science Professional | Master's in Data Science at Illinois Institute of Technology

Building scalable ML pipelines and data-driven solutions. Passionate about transforming data into actionable insights.

About Me

Currently pursuing my Master's in Data Science at Illinois Institute of Technology, I focus on bridging the gap between raw data and measurable business value. I don't just analyze data I design scalable ML pipelines that transform complex information into actionable strategies.

With a strong foundation in real-time data processing and production-ready systems, I’m specializing in MLOps to ensure models are not just built, but continuously monitored, scalable, and reliable in real-world environments.

Education

Master of Data Science

Illinois Institute of Technology, Chicago

Jan '24 – Present

GPA: 3.72/4.0

Bachelor of Engineering

Siddaganga Institute of Technology Tumkur, VTU University, Karnataka, India

Aug '17 - Aug '21

GPA: 3.4/4.0

Technical Skills

Core Data Analytics & Visualization

PythonRSQLJavaScriptPandasPostgreSQLMySQLTableauPower BIGoogle Looker StudioGrafanaLearning

Data Engineering & Data Platforms

Apache AirflowApache KafkaApache SparkDatabricksSnowflakeAWS S3AWS RedshiftPrestoAthenaAzure Data Lake

Machine Learning & Generative AI

Scikit-learnXGBoostPyTorchTensorFlowH2O.aiHugging FaceTransformersLangChainLearningLlamaIndexLearningLangGraphLearningRAGLearningVector Databases

MLOps, Cloud & Secure Systems

DockerKubernetesGitGitHub ActionsMLflowDVCDagsHubFastAPIFlaskStreamlitAWS EC2AWS LambdaAWS SageMakerAWS Elastic BeanstalkAzureOpenFHE

Professional Experience

Graduate Teaching Assistant

Illinois Institute of Technology

Chicago, Illinois, USA

Jan '26 – Present

Orchestrated academic support for 100+ students across Coursera-delivered database courses, triaging and resolving 150+ technical tickets per semester via Salesforce and Asana workflows.
Reduced average student resolution time by 45% through systematic ticket categorization, priority-based routing, and clear escalation paths.
Applied and taught data modeling best practices including normalized schema design (1NF–3NF), indexing strategies, stored procedure architecture, and query optimization across 30+ database projects.
Documented specifications and design decisions that improved project delivery rates by 35% and standardized grading and feedback workflows.
Mentored 25–30 students on end-to-end project tech-stack selection, guiding ORM choices, partitioning strategies, and multi-table join optimization while eliminating unresolved blockers through structured review cycles.

Data Scientist

Labelmaster

Chicago, Illinois, USA

Jan '25 – May '25

Increased CRM record completeness from 52% to 89% across 440K records, enabling a 30% improvement in account executive targeting accuracy for 50+ Chicago-based sales users.
Executed XGBoost imputation and Sentence-BERT feature engineering to replace manual audit processes with an automated data quality scoring pipeline.
Architected a holdout-based A/B testing framework with 95% statistical confidence gates to prevent underperforming model versions from reaching 50+ active sales users.
Improved lead-scoring model precision by 15% across 270K CRM records while cutting inference time by 2x and memory usage by 40% via GPU-batched Sentence-BERT pipelines stored in a vector index.
Eliminated 560+ hours per year of manual data review by tuning XGBoost classifiers with Optuna and SMOTE, achieving a 0.89 F1 score with drift detection and model monitoring for production retraining.

Data Consultant

Tata Consultancy Services (TCS)

India

Aug '21 – Dec '23

Delivered a credit risk segmentation report for a Canadian regional Equifax portfolio by training an XGBoost classifier on masked historical credit records with engineered features for payment velocity, utilization ratio, delinquency recency, and balance trajectory.
Surfaced three-tier risk classifications and default probability scores in an executive Looker Studio dashboard to inform portfolio strategy.
Accelerated cross-functional discrepancy detection by 30% and reduced reporting cycle time by 20% across Finance, Marketing, BI, and Sales teams by productionizing MigrationWatch dashboards in Power BI and Looker Studio with unified data definitions and lineage documentation.
Designed and owned the MF-GCP Validation Engine, verifying 1,200+ field mappings across Equifax Mainframe-to-GCP migration batches and improving correction match rates from 60% to 99.6% while enforcing PII compliance with Virtru encryption.
Cut validation runtime by 67%, cloud compute costs by 35%, and manual intervention by 80% across 1TB+ daily files by re-engineering 50+ Python validation scripts into partitioned and clustered BigQuery-native SQL with modular stored procedures and automated anomaly alerting.
Compressed quarterly compliance audits from 5 days to 2 days and improved data quality degradation detection by 30% by implementing a production drift monitoring pipeline using Apache Airflow and Google Composer to surface Population Stability Index alerts and metadata lineage signals.

Projects

Filter by status and role to browse projects.

Status:

Role:

Data Scientist

6 Completed1 In Progress

Hallucination Hunter — RAG Monitoring & LLMOps

Since Dec '25

In Progress

Building a Retrieval-Augmented Generation (RAG) chatbot and a monitoring suite to track hallucinations and response quality.

Orchestrate RAG using LangChain or LlamaIndex over PDFs/technical docs.
Track response quality and potential hallucinations using Arize Phoenix or MLflow.
CI/CD: vector DB updates trigger regression-style evals to ensure the bot doesn't get worse.

LangChainVector DBArize PhoenixRAG EvaluationCI/CD

Predictive Modeling for Customer Churn in Telecom

Mar '25 – Apr '25

Completed

Built a churn prediction model using H2O.ai and TensorFlow, achieving 85% AUC and 82% accuracy, enabling proactive customer retention strategies.

Deployed on AWS SageMaker with automated MLOps workflows (monitoring, retraining, performance tracking) to ensure scalable, production-ready deployment across large datasets.

H2O.aiTensorFlowAWS SageMakerMLOps

MLOps Engineer

4 Completed2 In Progress

Reproducible Data Pipeline with DVC & MLflow

Dec '25

Completed

Engineered a fault-tolerant machine learning pipeline ensuring 100% data lineage and model consistency across environments.

Orchestrated data and model versioning using DVC to eliminate "works on my machine" issues.
Implemented MLflow tracking to log hyperparameters and metrics for optimized Random Forest performance.
Decoupled pipeline stages (preprocess → train → evaluate) for modular scalability.

DVCMLflowPythonScikit-learnGit

Wine Quality Prediction — End-to-End MLOps on AWS

Dec '25

Completed

Architected a collaborative MLflow tracking server on AWS to centralize model management and experiment logging.

Deployed a remote tracking server on AWS EC2 backed by RDS (PostgreSQL) for robust metadata management.
Integrated AWS S3 as an artifact store to securely version control model binaries and plots.
Centralized experiment logging to enable team-based model iteration and comparison.

MLflowAWS EC2AWS RDSAWS S3Python

Data Engineer

3 Planned

Real-Time Data Streaming Platform

Planned

Create a real-time streaming analytics platform that ingests events, performs windowed processing, and powers live dashboards.

Kafka for high-throughput message queuing and ingestion.
Spark Streaming for windowing, aggregation, and near real-time processing.
Elasticsearch + Kibana for live analytics and visualization.

KafkaSpark StreamingElasticsearchKibanaPython

Scalable Data Lake Architecture

Planned

Design a scalable data lake to store structured and unstructured data with zone-based organization and governance.

Organize zones (raw, cleaned, curated) with partitioning strategies for TB-scale storage.
Add metadata catalog for discoverability and data lineage.
Enable querying via Presto/Athena and implement access controls.

AWS S3Azure Data LakeHadoopPrestoAthenaPython

Data Analyst

3 Completed1 Planned

CTA Transit Analytics & Bus Management System

Nov '24 - Dec '24

Completed

Developed a Streamlit + MySQL application to manage and visualize transit data, supporting route tracking and passenger flows.

Designed a relational schema and implemented analytics queries for trips and routes.
Optimized SQL queries and built scalable UI components for faster exploration.

StreamlitMySQLPythonSQL

Sales Insights Dashboard: Tableau & MySQL Project

Oct '24 - Nov '24

Completed

Built an interactive Tableau dashboard analyzing 150K+ sales transactions, visualizing revenue trends, top products, and customers.

Integrated with MySQL for real-time updates and enabled one-click regional filtering, reducing manual reporting effort and accelerating business decisions.

TableauMySQLData Visualization

Get in Touch

vinayak14anadinni@gmail.com

Phone

+1 (312)-610-2915

Location

U.S.A (Authorized to Work)