CrackJobs
Mad skills. Dream job!
Log in
Get Started
🤖 DATA SCIENCE INTERVIEWS

Mock interviews with accomplished ML engineers

Master machine learning algorithms, model debugging, feature engineering, and production ML systems with data scientists from top companies. Practice real interview questions with expert feedback.
₹20-55L
Salary Range
Python
Primary Language
4-6
Interview Rounds
10-12 weeks
Prep Timeline
Find ML Mentors →
3 CORE SKILL AREAS
What Companies Test in Data Science Interviews
Based on 600+ ML interviews at FAANG. These 3 areas cover every DS interview question.
ML Theory & Algorithms
Master model selection, algorithm fundamentals, and ML system design
Interview Weight: 35% of interviews
Algorithm selection (regression, classification, clustering)
Bias-variance tradeoff and regularization
Feature engineering and selection techniques
Model evaluation metrics (precision, recall, AUC-ROC)
Cross-validation and hyperparameter tuning
Practical ML & Debugging
Debug models, fix overfitting, and handle production ML issues
Interview Weight: 35% of interviews
Overfitting vs underfitting diagnosis and fixes
Data leakage detection and prevention
Model drift detection and retraining strategies
Debugging poor model performance systematically
A/B testing ML models in production
Coding: Python, Pandas, Algorithms
Write production-quality ML code and implement algorithms from scratch
Interview Weight: 30% of interviews
Pandas data manipulation (groupby, merge, pivot)
NumPy vectorization and performance optimization
Implementing ML algorithms (gradient descent, k-means)
Data structures and algorithms (sorting, searching, trees)
Writing clean, testable, maintainable ML code
ALGORITHM SELECTION GUIDE
Choose the Right ML Algorithm
Interviewers test if you can select appropriate algorithms. Here's your decision framework.
Problem Type: Predicting continuous values (prices, sales, temperature)
Linear Regression
Low
When to use: Linear relationships, interpretability needed
Random Forest Regressor
Medium
When to use: Non-linear relationships, feature importance
Gradient Boosting (XGBoost)
High
When to use: Best performance, willing to tune
Neural Networks
High
When to use: Complex patterns, large datasets
Problem Type: Binary classification (fraud detection, churn prediction)
Logistic Regression
Low
When to use: Simple baseline, interpretability required
Random Forest Classifier
Medium
When to use: Handles non-linearity, feature importance
XGBoost / LightGBM
High
When to use: Structured data, best performance
Neural Networks
High
When to use: Unstructured data (images, text)
Problem Type: Multi-class classification (image recognition, document categorization)
Naive Bayes
Low
When to use: Text classification, simple baseline
Random Forest
Medium
When to use: Structured data, interpretable
Neural Networks (CNN)
High
When to use: Images, complex patterns
Transformers
Very High
When to use: Text, state-of-the-art performance
MODEL DEBUGGING SCENARIOS
Real Production ML Problems
70% of ML interviews test debugging. Master these patterns to ace the practical round.
Model shows 98% accuracy in training, 65% in production
🔍 Diagnosis:
Classic overfitting - memorized training data
✅ Solutions:
Add regularization (L1/L2, dropout)
Collect more training data
Use simpler model architecture
Check for data leakage (future info in features)
Validate on holdout set from production distribution
Model performs well offline but fails A/B test in production
🔍 Diagnosis:
Online metrics don't match offline metrics or model drift
✅ Solutions:
Align offline metrics with business KPIs
Train on recent production data
Optimize model latency (quantization, pruning)
Add robust error handling for edge cases
Monitor model predictions in real-time
Feature importance shows unexpected results
🔍 Diagnosis:
Data leakage or multicollinearity
✅ Solutions:
Check timestamp logic carefully
Remove highly correlated features (VIF > 10)
Split data first, then preprocess
Use SHAP values for better feature interpretation
PYTHON ML CODE LIBRARY
Common ML Coding Patterns
These patterns appear in 80% of ML coding rounds. Master them for technical interviews.
Feature Engineering: Create time-based features
import pandas as pd # Extract temporal features from datetime df['hour'] = df['timestamp'].dt.hour df['day_of_week'] = df['timestamp'].dt.dayofweek df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int) # Create rolling features df['rolling_7d_mean'] = df.groupby('user_id')['purchases']\ .transform(lambda x: x.rolling(7, min_periods=1).mean()) # Days since last event df['days_since_last_purchase'] = ( df.groupby('user_id')['timestamp'].diff().dt.days.fillna(999) )
💡 Why this matters:
Time-based features often have high predictive power. Extract hour, day, rolling aggregates, and recency.
Cohort Analysis: Calculate 7-day retention by signup cohort
import pandas as pd # Group users by signup week signups = df.groupby('user_id')['signup_date'].min().reset_index() signups['cohort'] = signups['signup_date'].dt.to_period('W') # Calculate retention events = df.merge(signups[['user_id', 'cohort', 'signup_date']], on='user_id') events['days_since_signup'] = (events['event_date'] - events['signup_date']).dt.days retention = events[events['days_since_signup'] == 7]\ .groupby('cohort')['user_id'].nunique() cohort_size = signups.groupby('cohort')['user_id'].nunique() retention_rate = (retention / cohort_size * 100).round(2)
💡 Why this matters:
Cohort analysis reveals user behavior patterns. Key for churn prediction and product analytics.
Handle Imbalanced Classification (fraud detection)
from sklearn.utils import resample from imblearn.over_sampling import SMOTE # Option 1: Undersample majority class majority = df[df['fraud'] == 0] minority = df[df['fraud'] == 1] majority_downsampled = resample( majority, n_samples=len(minority), random_state=42 ) balanced = pd.concat([majority_downsampled, minority]) # Option 2: SMOTE (Synthetic Minority Oversampling) smote = SMOTE(random_state=42) X_resampled, y_resampled = smote.fit_resample(X_train, y_train) # Option 3: Class weights in model from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(class_weight='balanced')
💡 Why this matters:
Imbalanced classes (1% fraud) require special handling. Use resampling, SMOTE, or class weights.
MODEL EVALUATION DEEP DIVE
Beyond Accuracy: Choose the Right Metric
Accuracy is often meaningless. Understand which metric matches your business problem.
Precision & Recall
Precision = TP/(TP+FP), Recall = TP/(TP+FN)
Use case: Imbalanced classification (fraud, disease detection)
Tradeoff: Precision ↑ = fewer false alarms. Recall ↑ = catch more positives. Can't optimize both.
F1-Score / F-Beta
F1 = 2 * (Precision * Recall)/(Precision + Recall)
Use case: When you need balance between precision and recall
Tradeoff: F2-score weights recall 2x more (use when false negatives costly)
AUC-ROC
Area under ROC curve (TPR vs FPR)
Use case: When you need threshold-independent evaluation
Tradeoff: Good for comparing models, but doesn't tell you optimal threshold
Mean Absolute Error (MAE)
MAE = (1/n) * Σ|y_true - y_pred|
Use case: Regression when you care about average error magnitude
Tradeoff: Treats all errors equally, doesn't penalize large errors more
12-WEEK PREP ROADMAP
Complete Data Science Interview Prep Plan
Structured timeline to master ML theory, debugging, coding, and system design.
Weeks 1-3
ML Fundamentals
Master core algorithms (regression, trees, ensemble methods)
Understand bias-variance tradeoff deeply
Learn feature engineering techniques
Practice explaining algorithms to non-technical audience
Study evaluation metrics for different problem types
Weeks 4-6
Practical ML & Debugging
Practice debugging overfitting and underfitting
Learn to detect and fix data leakage
Master cross-validation and hyperparameter tuning
Study production ML challenges (drift, latency)
Practice A/B testing ML models
Weeks 7-9
Coding & Implementation
Implement algorithms from scratch (gradient descent, k-means)
Master Pandas data manipulation
Practice LeetCode medium problems (150+ problems)
Write clean, vectorized NumPy code
Build portfolio of ML projects
Weeks 10-12
Case Studies & Systems
Practice end-to-end ML case studies (10+ cases)
Learn ML system design (recommendation, search, ranking)
Master behavioral STAR storytelling
Practice whiteboard model debugging
Mock interviews with feedback
ML TOOLS ECOSYSTEM
Technologies Data Scientists Must Know
Core stack for production ML. Focus depth on 2-3 tools per category.
Core Libraries
scikit-learn
XGBoost
LightGBM
CatBoost
TensorFlow
PyTorch
Data Processing
Pandas
NumPy
Polars
Dask
PySpark
Visualization
Matplotlib
Seaborn
Plotly
SHAP (explainability)
MLOps & Production
MLflow
Weights & Biases
Docker
Kubernetes
Airflow
Cloud Platforms
AWS SageMaker
GCP Vertex AI
Azure ML
Databricks
SUCCESS STORIES
From Practice to FAANG ML Offers
These data scientists mastered ML interviews with CrackJobs and landed dream roles.
Vikram P.
ML Engineer
"Google asked me to debug a model that was overfitting. I walked through my systematic approach: check training curves, look for data leakage, add regularization, validate on holdout set. Mentioned cross-validation and early stopping. They said my debugging process was 'exactly what we do here.'"
Ananya K.
Applied Scientist
"The case study was brutal: 'Build a churn prediction model.' I structured it like CrackJobs taught me—EDA, feature engineering, algorithm selection, evaluation metrics, production considerations. Explained precision-recall tradeoff for imbalanced data"
Rahul M.
Data Scientist
"Meta's coding round was intense—implement gradient descent from scratch in 30 minutes. Thanks to CrackJobs, I'd practiced this 20+ times. Wrote clean, vectorized code with NumPy, handled edge cases, explained the math. Cleared in 22 minutes."
AVOID THESE MISTAKES
5 ML Interview Mistakes That Fail Candidates
Based on 700+ ML interview evaluations. Fix these to dramatically improve your performance.
Mistake #1
Not starting with exploratory data analysis (EDA) before modeling
Why it fails:
Leads to wrong feature choices and missed data quality issues
✅ How to fix it:
Always start interviews with: 'Let me first explore the data—check distributions, missing values, correlations, outliers.' Shows you understand data before jumping to models. Mention specific checks: df.describe(), df.isnull().sum(), correlation matrix.
Mistake #2
Choosing algorithms without justifying the choice
Why it fails:
Shows lack of understanding of algorithm trade-offs
✅ How to fix it:
Always explain: 'I'd start with XGBoost because it handles non-linearity well, gives feature importance, and is robust to outliers. For comparison, I'd try Logistic Regression as a simple baseline to see if we need complexity.' Justify every choice.
Mistake #3
Not discussing model evaluation beyond accuracy
Why it fails:
Accuracy is often misleading, especially with imbalanced data
✅ How to fix it:
For fraud detection (1% fraud), 99% accuracy is useless if model predicts 'no fraud' always. Discuss: precision, recall, F1-score, AUC-ROC. Say: 'For this problem, I'd optimize for recall because missing fraud is costly. I'd use F2-score to weight recall 2x more than precision.'
Mistake #4
Implementing ML algorithms without explaining the math
Why it fails:
Interviewers want to see you understand what's under the hood
✅ How to fix it:
When coding gradient descent, say: 'We're minimizing loss by iterating: theta = theta - learning_rate * gradient. Learning rate controls step size—too high causes oscillation, too low is slow. I'll add momentum for faster convergence.' Math + code = strong signal.
Mistake #5
Not mentioning production considerations and monitoring
Why it fails:
Shows you've never deployed ML models to production
✅ How to fix it:
Always end with production concerns: 'In production, I'd monitor prediction distribution, feature drift, latency, and business metrics. Set up alerts for model performance degradation. Plan for retraining cadence—weekly for fast-changing data, monthly for stable data.'
DEEP DIVE GUIDES
Master Specific ML Topics
Common ML Interview Mistakes to Avoid
Learn the most common pitfalls in ML interviews—from algorithm selection to model debugging and production ML.
Read Complete Guide →
HOW IT WORKS
Practice ML Interviews in 3 Steps
1
Choose ML Focus
Select ML theory, model debugging, or Python coding. Browse data scientists from top companies.
2
Practice 55-Min Session
Work through real ML problems—algorithm selection, debugging overfitting, coding challenges. Get live feedback.
3
Get Expert Evaluation
Detailed feedback on ML theory, problem-solving approach, code quality, and communication.
Ready to Master ML Interviews?
Join 350+ data scientists who mastered ML algorithms, model debugging, and Python coding. Start practicing today.
Start ML Practice Today →
🤖 Real ML problems
🐛 Debug production models
💳 Pay per session