Data Scientist Technical Interview Questions & Answers (2026)

Technical Interview Guide · Data & Analytics · Updated 2025-04-01

Key Takeaway

Data scientist technical interviews test statistical knowledge, machine learning fundamentals, coding proficiency (Python/SQL), and the ability to design ML systems. This guide covers the most common technical question patterns and frameworks for answering them.

Overview

Data science technical interviews span a wide range: statistics and probability, machine learning theory, coding challenges (often involving pandas/SQL), and ML system design. Companies typically run 2-3 rounds covering these areas. The key differentiator is connecting theoretical knowledge to practical application — interviewers want to see you think about feature engineering, model evaluation, and deployment, not just algorithm selection.

Technical Interview Questions for Data Scientist Roles

Q1: You have a dataset with 10 million rows and 500 features. How would you approach building a classification model?

What they're really asking: This tests your end-to-end ML workflow thinking, not just model selection. The interviewer evaluates your approach to data exploration, feature engineering, model selection, and evaluation.

How to answer: Walk through a structured ML pipeline: EDA, feature engineering, baseline model, iteration, evaluation, and deployment considerations.

See example answer

I'd start with EDA: check class distribution (imbalanced?), missing values, feature types (categorical vs numerical), and correlations. With 500 features, I'd first look for obvious drops: remove constant or near-constant features, highly correlated pairs (keep one), and features with >50% missing values. Next, I'd establish a baseline with a simple model — logistic regression or random forest on a sample of 500K rows. This gives a benchmark and helps identify which features matter (feature importance). With 10M rows, I'd check if more data helps or if we've hit diminishing returns by plotting learning curves on subsamples (100K, 500K, 1M, 5M). For feature engineering, I'd create interaction features between the top 20 most important features, handle categoricals with target encoding (careful about leakage — use cross-validated encoding), and consider dimensionality reduction (PCA) on numerical feature groups. For the final model, I'd likely use gradient boosting (XGBoost/LightGBM) — it handles the feature space well, is fast to train on 10M rows, and provides good baseline performance. I'd evaluate with stratified cross-validation using business-relevant metrics (precision-recall, not just AUC, depending on the cost of false positives vs false negatives). Finally, I'd consider inference time constraints for deployment.

Q2: Explain the bias-variance trade-off and how it affects model selection.

What they're really asking: This fundamental ML concept tests your theoretical depth. The interviewer wants a clear explanation with practical implications, not a textbook definition.

How to answer: Explain the concepts intuitively, show how they relate to model complexity, and give practical examples of diagnosing and addressing each.

See example answer

Bias measures how far off predictions are from the true values on average — it's systematic error from wrong assumptions. High bias means the model is too simple to capture the real patterns (underfitting). A linear regression trying to fit a quadratic relationship has high bias. Variance measures how much predictions change across different training samples. High variance means the model is too sensitive to the specific training data (overfitting). A deep decision tree that memorizes training data has high variance. The trade-off: as model complexity increases, bias decreases (the model can capture more patterns) but variance increases (it starts fitting noise). The sweet spot minimizes total error = bias² + variance + irreducible noise. Practically, I diagnose this by comparing training error and validation error. Large gap = high variance (try regularization, more data, simpler model). Both errors high = high bias (try more features, complex model, feature engineering). In practice, I usually start simple (high bias, low variance) and gradually increase complexity, monitoring validation error. Ensemble methods like random forests reduce variance by averaging, while boosting reduces bias by iteratively correcting errors. The choice depends on the data size: with little data, I prefer low-variance models; with abundant data, I can afford more complex models.

Q3: Write SQL to find the top 3 products by revenue for each category, including ties.

What they're really asking: This tests SQL proficiency with window functions, ranking, and analytical query patterns — essential daily skills for data scientists.

How to answer: Use window functions (DENSE_RANK or RANK) with PARTITION BY for the category grouping.

See example answer

I'd use DENSE_RANK to handle ties — if two products tie for 2nd place, both get rank 2 and the next product gets rank 3 (unlike RANK which would skip to 4, or ROW_NUMBER which would arbitrarily break the tie). Query: WITH ranked AS (SELECT category, product_id, SUM(revenue) as total_revenue, DENSE_RANK() OVER (PARTITION BY category ORDER BY SUM(revenue) DESC) as rnk FROM orders GROUP BY category, product_id) SELECT category, product_id, total_revenue, rnk FROM ranked WHERE rnk <= 3 ORDER BY category, rnk. The CTE first calculates total revenue per product per category, then ranks within each category. The outer query filters to top 3 ranks. I chose DENSE_RANK because the question says 'including ties' — if two products are tied for 3rd, both should appear. If the interviewer wanted exactly 3 rows per category regardless of ties, I'd use ROW_NUMBER with a tiebreaker (like product_id). I'd also discuss performance: with a large orders table, an index on (category, product_id, revenue) would help the GROUP BY and window function.

Q4: How would you evaluate a recommendation system? What metrics would you use?

What they're really asking: This tests your understanding of evaluation beyond accuracy, including offline metrics, online metrics, and the relationship between them. Recommendation systems require nuanced evaluation.

How to answer: Discuss offline metrics (precision@k, recall@k, NDCG, MAP), online metrics (CTR, engagement, revenue), and the gap between them.

See example answer

I'd evaluate at three levels: offline, online, and business impact. Offline metrics: Precision@K measures what fraction of the top K recommendations are relevant. Recall@K measures what fraction of all relevant items appear in top K. NDCG (Normalized Discounted Cumulative Gain) accounts for ranking position — relevant items ranked higher are scored more favorably. MAP (Mean Average Precision) summarizes precision across all recall levels. I'd also measure catalog coverage (what percentage of items ever get recommended) and diversity (how different recommendations are across users). However, offline metrics often don't correlate perfectly with business outcomes. For online evaluation, I'd use A/B testing measuring CTR, conversion rate, time spent, and crucially, long-term engagement (not just immediate clicks). A model that maximizes CTR might recommend clickbait, not genuinely useful items. I'd also track the distribution of recommendations: are we creating filter bubbles? Business metrics: revenue per user, retention rate, and customer satisfaction scores. The key insight is that the best recommendation system isn't the one with the highest NDCG — it's the one that balances relevance, diversity, freshness, and business goals. I'd also measure novelty (are we recommending things the user wouldn't have found on their own?) as a health metric.

Q5: How do you handle class imbalance in a classification problem? Walk through your approach.

What they're really asking: This tests practical ML knowledge beyond textbook algorithms. Class imbalance is one of the most common real-world challenges, and the interviewer wants to see systematic thinking.

How to answer: Discuss multiple approaches at different levels: data-level, algorithm-level, and evaluation-level strategies.

See example answer

I approach class imbalance at three levels. First, evaluation: never use accuracy for imbalanced datasets. A model predicting 'not fraud' for 99.9% accuracy is useless. I'd use precision-recall AUC, F1 score, or a custom metric based on the business cost of false positives vs false negatives. Second, data-level approaches: oversampling the minority class (SMOTE creates synthetic examples by interpolating between existing minority samples), undersampling the majority class (effective when you have abundant data), or a combination. I'd be careful with SMOTE on high-dimensional data where synthetic examples may not represent real patterns. Third, algorithm-level: use class_weight='balanced' in scikit-learn (adjusts loss function to penalize minority misclassification more), or use algorithms naturally suited for imbalance like gradient boosting with scale_pos_weight. For extreme imbalance (1:10000 like fraud), I've found that combining undersampling with ensemble methods works well — train multiple models on different undersampled majority subsets and ensemble their predictions. The right approach depends on the severity of imbalance and the business cost structure. If missing a fraud case costs $10K but a false alarm costs $10 in manual review, I'd optimize for high recall even at the expense of precision.

Ace the interview — but first, get past ATS screening. Make sure your resume reaches the hiring manager with Ajusta's 5-component ATS scoring — 1 free optimization, no card required.

Scan Your Resume Free →

Preparation Tips

Review statistics fundamentals: hypothesis testing, confidence intervals, A/B testing, Bayesian reasoning
Practice SQL with window functions, CTEs, and analytical patterns — SQL is tested in nearly every DS technical round
Be ready to write Python/pandas code on a whiteboard: data cleaning, feature engineering, and model training workflows
Know ML algorithms at a conceptual level: how they work, when to use them, hyperparameter intuition, and failure modes
Practice explaining ML concepts simply — the best signal of understanding is clear, jargon-free explanation
Review probability and statistics brain teasers (Bayes' theorem, expected value, conditional probability)

Common Mistakes to Avoid

Jumping to model selection without discussing data exploration, feature engineering, or evaluation strategy
Not asking clarifying questions about the business context — the best model depends on the use case
Using accuracy as the primary metric without considering class imbalance or business cost structure
Being unable to explain model choices beyond 'it's the best algorithm' — you need to justify why for this specific problem
Writing SQL without handling edge cases (NULLs, duplicates, ties) or discussing query performance
Focusing on model architecture without discussing feature engineering — features matter more than algorithms in most real-world problems

Research Checklist

Before your technical interview, make sure you have researched:

Research the company's data science team: applied ML, research, analytics, or a mix?
Understand the company's product and what ML/data problems they solve (recommendations, pricing, fraud, NLP)
Check the company's engineering/data science blog for technical posts about their ML stack
Review the job description for specific tool requirements (Python, R, SQL, Spark, TensorFlow, PyTorch)
Understand the company's data scale to calibrate your answers (startup with 100K rows vs Big Tech with billions)
Practice with the same coding environment the company uses (CoderPad, Jupyter, Google Colab)

Questions to Ask Your Interviewer

What does the ML development lifecycle look like from problem definition to production deployment?
What's the data infrastructure like? What tools does the team use for feature engineering and model training?
How does the team decide which problems to work on? What's the process for prioritizing ML projects?
What does model monitoring look like? How do you detect and handle model drift?
What's the biggest data science challenge the team is working on right now?
How does the data science team collaborate with engineering for model deployment?

How Your Resume Connects to the Interview

Data science technical interviews expect you to demonstrate the skills listed on your resume. Ajusta ensures your resume includes the specific tools (Python, SQL, TensorFlow, pandas), techniques (feature engineering, A/B testing, ensemble methods), and business impact metrics that ATS systems and technical interviewers look for.

Interview Prep Data Scientist Behavioral Interview Guide Interview Prep Machine Learning Engineer Technical Interview Guide Interview Prep Data Engineer Technical Interview Guide

Data Scientist Technical Interview Questions & Answers (2026)

Overview

Technical Interview Questions for Data Scientist Roles

Q1: You have a dataset with 10 million rows and 500 features. How would you approach building a classification model?

Q2: Explain the bias-variance trade-off and how it affects model selection.

Q3: Write SQL to find the top 3 products by revenue for each category, including ties.

Q4: How would you evaluate a recommendation system? What metrics would you use?

Q5: How do you handle class imbalance in a classification problem? Walk through your approach.

Preparation Tips

Common Mistakes to Avoid

Research Checklist

Questions to Ask Your Interviewer

How Your Resume Connects to the Interview

Related Interview Guides

Ready to Optimize Your Resume?