Hi, I'm Vinayak Shrishail Anadinni

Data Science Professional | Master's in Data Science at Illinois Institute of Technology

Building scalable ML pipelines and data-driven solutions. Passionate about transforming data into actionable insights.

Vinayak Shrishail Anadinni

About Me

Currently pursuing my Master's in Data Science at Illinois Institute of Technology, I focus on bridging the gap between raw data and measurable business value. I don't just analyze data I design scalable ML pipelines that transform complex information into actionable strategies.

With a strong foundation in real-time data processing and production-ready systems, I’m specializing in MLOps to ensure models are not just built, but continuously monitored, scalable, and reliable in real-world environments.

Education

Master of Data Science

Illinois Institute of Technology, Chicago

Jan '24 – Present

GPA: 3.72/4.0

Bachelor of Engineering

Siddaganga Institute of Technology Tumkur, VTU University, Karnataka, India

Aug '17 - Aug '21

GPA: 3.4/4.0

Technical Skills

Core Data Analytics & Visualization

PythonRSQLJavaScriptPandasPostgreSQLMySQLTableauPower BIGoogle Looker StudioGrafanaLearning

Data Engineering & Data Platforms

Apache AirflowApache KafkaApache SparkDatabricksSnowflakeAWS S3AWS RedshiftPrestoAthenaAzure Data Lake

Machine Learning & Generative AI

Scikit-learnXGBoostPyTorchTensorFlowH2O.aiHugging FaceTransformersLangChainLearningLlamaIndexLearningLangGraphLearningRAGLearningVector Databases

MLOps, Cloud & Secure Systems

DockerKubernetesGitGitHub ActionsMLflowDVCDagsHubFastAPIFlaskStreamlitAWS EC2AWS LambdaAWS SageMakerAWS Elastic BeanstalkAzureOpenFHE

Professional Experience

Graduate Teaching Assistant

Illinois Institute of Technology

Chicago, Illinois, USA
Jan '26 – Present
  • Orchestrated academic support for 100+ students across Coursera-delivered database courses, triaging and resolving 150+ technical tickets per semester via Salesforce and Asana workflows.
  • Reduced average student resolution time by 45% through systematic ticket categorization, priority-based routing, and clear escalation paths.
  • Applied and taught data modeling best practices including normalized schema design (1NF–3NF), indexing strategies, stored procedure architecture, and query optimization across 30+ database projects.
  • Documented specifications and design decisions that improved project delivery rates by 35% and standardized grading and feedback workflows.
  • Mentored 25–30 students on end-to-end project tech-stack selection, guiding ORM choices, partitioning strategies, and multi-table join optimization while eliminating unresolved blockers through structured review cycles.

Data Scientist

Labelmaster

Chicago, Illinois, USA
Jan '25 – May '25
  • Increased CRM record completeness from 52% to 89% across 440K records, enabling a 30% improvement in account executive targeting accuracy for 50+ Chicago-based sales users.
  • Executed XGBoost imputation and Sentence-BERT feature engineering to replace manual audit processes with an automated data quality scoring pipeline.
  • Architected a holdout-based A/B testing framework with 95% statistical confidence gates to prevent underperforming model versions from reaching 50+ active sales users.
  • Improved lead-scoring model precision by 15% across 270K CRM records while cutting inference time by 2x and memory usage by 40% via GPU-batched Sentence-BERT pipelines stored in a vector index.
  • Eliminated 560+ hours per year of manual data review by tuning XGBoost classifiers with Optuna and SMOTE, achieving a 0.89 F1 score with drift detection and model monitoring for production retraining.

Data Consultant

Tata Consultancy Services (TCS)

India
Aug '21 – Dec '23
  • Delivered a credit risk segmentation report for a Canadian regional Equifax portfolio by training an XGBoost classifier on masked historical credit records with engineered features for payment velocity, utilization ratio, delinquency recency, and balance trajectory.
  • Surfaced three-tier risk classifications and default probability scores in an executive Looker Studio dashboard to inform portfolio strategy.
  • Accelerated cross-functional discrepancy detection by 30% and reduced reporting cycle time by 20% across Finance, Marketing, BI, and Sales teams by productionizing MigrationWatch dashboards in Power BI and Looker Studio with unified data definitions and lineage documentation.
  • Designed and owned the MF-GCP Validation Engine, verifying 1,200+ field mappings across Equifax Mainframe-to-GCP migration batches and improving correction match rates from 60% to 99.6% while enforcing PII compliance with Virtru encryption.
  • Cut validation runtime by 67%, cloud compute costs by 35%, and manual intervention by 80% across 1TB+ daily files by re-engineering 50+ Python validation scripts into partitioned and clustered BigQuery-native SQL with modular stored procedures and automated anomaly alerting.
  • Compressed quarterly compliance audits from 5 days to 2 days and improved data quality degradation detection by 30% by implementing a production drift monitoring pipeline using Apache Airflow and Google Composer to surface Population Stability Index alerts and metadata lineage signals.

Projects

Filter by status and role to browse projects.

Status:
Role:

Data Scientist

6 Completed1 In Progress

Hallucination Hunter — RAG Monitoring & LLMOps

Since Dec '25

In Progress

Building a Retrieval-Augmented Generation (RAG) chatbot and a monitoring suite to track hallucinations and response quality.

  • Orchestrate RAG using LangChain or LlamaIndex over PDFs/technical docs.
  • Track response quality and potential hallucinations using Arize Phoenix or MLflow.
  • CI/CD: vector DB updates trigger regression-style evals to ensure the bot doesn't get worse.
LangChainVector DBArize PhoenixRAG EvaluationCI/CD

Predictive Modeling for Customer Churn in Telecom

Mar '25 – Apr '25

Completed

Built a churn prediction model using H2O.ai and TensorFlow, achieving 85% AUC and 82% accuracy, enabling proactive customer retention strategies.

  • Deployed on AWS SageMaker with automated MLOps workflows (monitoring, retraining, performance tracking) to ensure scalable, production-ready deployment across large datasets.
H2O.aiTensorFlowAWS SageMakerMLOps

MLOps Engineer

4 Completed2 In Progress

Reproducible Data Pipeline with DVC & MLflow

Dec '25

Completed

Engineered a fault-tolerant machine learning pipeline ensuring 100% data lineage and model consistency across environments.

  • Orchestrated data and model versioning using DVC to eliminate "works on my machine" issues.
  • Implemented MLflow tracking to log hyperparameters and metrics for optimized Random Forest performance.
  • Decoupled pipeline stages (preprocess → train → evaluate) for modular scalability.
DVCMLflowPythonScikit-learnGit

Wine Quality Prediction — End-to-End MLOps on AWS

Dec '25

Completed

Architected a collaborative MLflow tracking server on AWS to centralize model management and experiment logging.

  • Deployed a remote tracking server on AWS EC2 backed by RDS (PostgreSQL) for robust metadata management.
  • Integrated AWS S3 as an artifact store to securely version control model binaries and plots.
  • Centralized experiment logging to enable team-based model iteration and comparison.
MLflowAWS EC2AWS RDSAWS S3Python

Data Engineer

3 Planned

Real-Time Data Streaming Platform

Planned

Planned

Create a real-time streaming analytics platform that ingests events, performs windowed processing, and powers live dashboards.

  • Kafka for high-throughput message queuing and ingestion.
  • Spark Streaming for windowing, aggregation, and near real-time processing.
  • Elasticsearch + Kibana for live analytics and visualization.
KafkaSpark StreamingElasticsearchKibanaPython

Scalable Data Lake Architecture

Planned

Planned

Design a scalable data lake to store structured and unstructured data with zone-based organization and governance.

  • Organize zones (raw, cleaned, curated) with partitioning strategies for TB-scale storage.
  • Add metadata catalog for discoverability and data lineage.
  • Enable querying via Presto/Athena and implement access controls.
AWS S3Azure Data LakeHadoopPrestoAthenaPython

Data Analyst

3 Completed1 Planned

CTA Transit Analytics & Bus Management System

Nov '24 - Dec '24

Completed

Developed a Streamlit + MySQL application to manage and visualize transit data, supporting route tracking and passenger flows.

  • Designed a relational schema and implemented analytics queries for trips and routes.
  • Optimized SQL queries and built scalable UI components for faster exploration.
StreamlitMySQLPythonSQL

Sales Insights Dashboard: Tableau & MySQL Project

Oct '24 - Nov '24

Completed

Built an interactive Tableau dashboard analyzing 150K+ sales transactions, visualizing revenue trends, top products, and customers.

  • Integrated with MySQL for real-time updates and enabled one-click regional filtering, reducing manual reporting effort and accelerating business decisions.
TableauMySQLData Visualization

Get in Touch

Location

U.S.A (Authorized to Work)