Data Analyst Interview Questions 2026
Fresher · Intern · Experienced · Excel · SQL · Python · Statistics
Top 100+ Data Analyst Interview Questions with Detailed Answers — Basic to Advanced, Behavioural to Technical, How to Prepare, Salary Guide — Complete 2026 Resource.
A Data Analyst Interview typically has 3-4 rounds: HR Screening → Technical Round (SQL/Python/Excel) → Case Study/Assignment → Final HR. Companies test your ability to clean, analyse, visualise data, and communicate insights clearly.
20-25 Questions
15-20 Questions
10-15 Questions
10-15 Questions
10-12 Questions
5-8 Questions
These “why” questions test your motivation, self-awareness and fit. Always give specific, genuine answers.
“I enjoy finding patterns in complex data and translating them into actionable business decisions. My background in [statistics/mathematics/business] naturally led me to data analysis. I find it deeply satisfying when a well-built dashboard helps a team make a better decision faster. I specifically want this role at [Company] because of your data-driven culture and the scale of problems I’d get to solve.”
✅ Mention specific skills you enjoy using
✅ Connect to the company’s industry or product
✅ Avoid generic answers like “I love numbers” 💡 Pro Tip: Research the company’s data stack beforehand
4 Main Types:
• Descriptive Analysis — What happened? (Summary statistics, dashboards)
• Diagnostic Analysis — Why did it happen? (Root cause analysis, drill-downs)
• Predictive Analysis — What will happen? (Forecasting, ML models)
• Prescriptive Analysis — What should we do? (Optimisation, recommendations)
Unstructured Data: No predefined format (emails, images, videos, social media posts, PDFs). Requires NLP, computer vision, or specialised tools. Example: Customer reviews on Amazon.
Steps involved: Handling missing values · Removing duplicates · Fixing data types · Standardising formats · Handling outliers
Why important: “Garbage in, garbage out” — Analysis on dirty data leads to wrong conclusions. 60-80% of a data analyst’s time is spent on data cleaning. 💡 Mention tools: Python (pandas), Excel, SQL, OpenRefine
| Feature | Database | Data Warehouse | Data Lake |
|---|---|---|---|
| Purpose | Transactional (OLTP) | Analytics (OLAP) | Raw storage |
| Data type | Structured | Structured | All types |
| Schema | Schema-on-write | Schema-on-write | Schema-on-read |
| Examples | MySQL, PostgreSQL | Snowflake, Redshift | AWS S3, Azure |
Business KPIs: Monthly Active Users (MAU) · Customer Churn Rate · Conversion Rate · Revenue per User
Data Analyst KPIs: Report delivery time · Data accuracy rate · Dashboard adoption rate · Query optimisation improvement
Median: Middle value when sorted. Resistant to outliers. Use for skewed data (e.g., income).
Mode: Most frequently occurring value. Use for categorical data.
Practical use: If a dataset has extreme outliers (e.g., one employee earns ₹1 Crore while others earn ₹30K), median salary is a better representation than mean.
• SQL — SELECT, JOINs, GROUP BY, subqueries, window functions (intermediate)
• Python — pandas, numpy, matplotlib, seaborn (intermediate)
• Excel — VLOOKUP, Pivot Tables, conditional formatting, basic macros
• Tableau/Power BI — basic dashboards (beginner/self-taught)
• Statistics — hypothesis testing, regression, descriptive stats
Example: “For my final year project, I analysed 3 years of sales data for a retail company (CSV files, ~50K rows). I used Python (pandas) to clean the data — handled 8% missing values and removed duplicates. I performed cohort analysis to identify that customers acquired during festive sales had 2x higher lifetime value. I visualised findings in Tableau and presented to my professor. The insight suggested that festive acquisition campaigns should get higher budget allocation.”
TRUNCATE: Removes all rows from a table. Faster, minimal logging. Cannot filter rows.
DROP: Completely deletes the table structure and data. Permanent — cannot be rolled back.
Memory tip: Delete = selective removal | Truncate = clean the table | Drop = destroy the table
customer_id in Customers table.Foreign Key: A column that references the primary key of another table. Creates a relationship between tables. Example:
customer_id in Orders table references Customers table.• MCAR (Missing Completely At Random) — safe to drop
• MAR (Missing At Random) — can impute
• MNAR (Missing Not At Random) — requires domain knowledge
Step 2 — Choose a strategy:
• For numerical: Mean/Median imputation, forward-fill
• For categorical: Mode imputation, create “Unknown” category
• Drop rows if <5% missing and MCAR
• Use model-based imputation (KNN, MICE) for complex cases
— Example: Get department for Employee ID 1001
= VLOOKUP(1001, A2:C100, 2, FALSE)
Use cases:
• Monthly sales by region and product
• Count of customers by age group
• Average order value by channel
Pro tip: Use Slicers for interactive filtering and Power Pivot for large datasets.
Histogram: Shows frequency distribution of a continuous variable. X-axis = numeric ranges (bins). No gaps between bars.
Quick rule: Bar chart = categorical data (e.g., sales by country). Histogram = continuous data (e.g., age distribution of users).
Conditional: IF, IFERROR, IFNA, COUNTIF, SUMIF, AVERAGEIF
Array/Dynamic: UNIQUE, FILTER, SORT, SEQUENCE (Excel 365)
Statistical: STDEV, CORREL, FORECAST, PERCENTILE
Text: LEFT, RIGHT, MID, CONCATENATE/CONCAT, TRIM, SUBSTITUTE
Date/Time: DATEDIF, NETWORKDAYS, EOMONTH
Database: DSUM, DCOUNT, DGET
Method 2 — COUNTIF approach:
= VLOOKUP(A2, D:F, 2, FALSE)
— INDEX MATCH: Flexible — works in any direction
= INDEX(D:D, MATCH(A2, E:E, 0))
✅ Returns value from any column (not just right of lookup)
✅ Faster on large datasets
✅ Not affected by inserting/deleting columns
✅ More flexible with array formulas
Method 2 — FORECAST function:
— Or in Excel 365: =FORECAST.LINEAR()
Step 2: Build Pivot Tables on the cleaned data
Step 3: Create dynamic charts linked to Pivot Tables
Step 4: Add Slicers and Timelines for interactivity
Step 5: Use VBA macro for one-click refresh + email distribution (optional)
Step 6: Protect sheets from accidental editing
HAVING: Filters groups AFTER GROUP BY. Works on aggregated values.
SELECT department, COUNT(*) FROM employees
WHERE status = ‘Active’
GROUP BY department
HAVING COUNT(*) > 10 — HAVING filters after grouping
SELECT o.order_id, c.name FROM orders o INNER JOIN customers c ON o.customer_id = c.id
— LEFT JOIN: All rows from left + matching from right (NULLs if no match)
SELECT c.name, o.order_id FROM customers c LEFT JOIN orders o ON c.id = o.customer_id
— RIGHT JOIN: All rows from right + matching from left
— FULL OUTER JOIN: All rows from both, NULLs where no match
— CROSS JOIN: Cartesian product — every row × every row
— SELF JOIN: Table joined with itself (for hierarchy/comparison)
SELECT name, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rank
FROM employees
— RANK vs DENSE_RANK: RANK skips numbers on ties, DENSE_RANK doesn’t
— Running total
SELECT date, revenue, SUM(revenue) OVER (ORDER BY date) AS running_total FROM sales
— LAG/LEAD: Access previous/next row value
SELECT date, revenue, LAG(revenue,1) OVER (ORDER BY date) AS prev_revenue FROM sales
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)
— Method 2: DENSE_RANK (better — handles ties)
SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees
) t WHERE rnk = 2
— Method 3: LIMIT/OFFSET (MySQL)
SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1
WITH clause. It improves readability and allows recursive queries.SELECT customer_id, SUM(order_total) AS lifetime_value
FROM orders
GROUP BY customer_id
HAVING SUM(order_total) > 10000
)
SELECT c.name, h.lifetime_value
FROM customers c
JOIN high_value_customers h ON c.id = h.customer_id
[1, 2, 3] — use for sequences you’ll modify.Tuple: Ordered, immutable, allows duplicates.
(1, 2, 3) — use for fixed data (coordinates, RGB).Dictionary: Key-value pairs, mutable, keys unique.
{"name": "Alice", "age": 25} — use for lookups.# Check missing values
df.isnull().sum()
# Drop rows with any missing value
df.dropna()
# Fill with mean (numerical)
df[‘age’].fillna(df[‘age’].mean(), inplace=True)
# Fill with mode (categorical)
df[‘city’].fillna(df[‘city’].mode()[0], inplace=True)
# Forward fill (time series)
df.fillna(method=‘ffill’)
result = pd.merge(df1, df2, on=‘customer_id’, how=‘inner’)
# LEFT JOIN
result = pd.merge(df1, df2, on=‘customer_id’, how=‘left’)
# Merge on different column names
result = pd.merge(df1, df2, left_on=‘id’, right_on=‘cust_id’)
# concat: Stack DataFrames vertically
combined = pd.concat([df1, df2], ignore_index=True)
df.groupby(‘region’)[‘sales’].sum()
# Multiple aggregations
df.groupby(‘category’).agg({
‘sales’: [‘sum’, ‘mean’, ‘count’],
‘profit’: ‘sum’
})
# GroupBy + transform (keeps original shape)
df[‘pct_of_total’] = df[‘sales’] / df.groupby(‘region’)[‘sales’].transform(‘sum’)
• p < 0.05: Statistically significant — reject the null hypothesis
• p ≥ 0.05: Not statistically significant — fail to reject null hypothesis
Common mistake: p-value does NOT tell you the probability that the null hypothesis is true. It also doesn’t measure effect size — a small p-value can come from a trivial effect in a large sample. 💡 Always report effect size (Cohen’s d) alongside p-value
Type II Error (False Negative — β): Failing to reject a false null hypothesis. “Missing the wolf.” Example: Concluding a drug doesn’t work when it does.
Trade-off: Lowering α (stricter significance) increases risk of Type II error.
Correlation ≠ Causation: Just because two variables move together doesn’t mean one causes the other.
Famous example: Ice cream sales and drowning deaths are positively correlated — both increase in summer. Hot weather (confounding variable) causes both. Banning ice cream won’t prevent drownings.
Steps:
1. Define hypothesis (e.g., “New checkout button increases conversion rate”)
2. Define success metric (conversion rate)
3. Calculate sample size needed (based on α, power, expected effect size)
4. Randomly assign users to control and treatment
5. Run test for sufficient time (don’t peek too early!)
6. Analyse results with statistical significance test (z-test/chi-square)
7. Make decision based on data
Right (Positive) Skew: Tail on right. Mean > Median > Mode. Example: Income distribution (most earn little, few earn a lot).
Left (Negative) Skew: Tail on left. Mode > Median > Mean. Example: Exam scores when most students did well.
“At [Company], our sales team received a weekly Excel report that took 4 hours to compile manually. I automated the data extraction using Python (connecting to our MySQL database), transformed the data with pandas, and built a Power BI dashboard that refreshed automatically every morning. This saved 16 hours/month, reduced human error, and gave leadership real-time visibility — leading to 2 decisions being made 48 hours faster than previously.”
✅ Lead with the business answer, not the methodology
✅ Use plain language — avoid “p-value”, say “with 95% confidence”
✅ Use visuals over tables — one clear chart beats a complex table
✅ Anchor to business impact (revenue, cost, time saved)
✅ Prepare for “so what?” — always have a recommendation ready
✅ Use analogies for complex concepts
Structure: What happened → How you caught/admitted it → What you did to fix it → What safeguards you put in place
Example: “I once calculated month-over-month growth incorrectly because I didn’t account for timezone differences in our event data. A stakeholder noticed inconsistency. I caught the error within 2 hours, corrected the report, and communicated the issue proactively. Since then, I always add a ‘data validation’ section in every analysis and build automated alerts for unexpected data drops.”
Key components: Data ownership · Data quality standards · Access control · Lineage tracking · Compliance (GDPR, HIPAA)
Why it matters for analysts: Poor governance leads to conflicting reports (“which number is right?”), security breaches, and regulatory fines. Analysts who understand governance build more trustworthy analyses.
📅 4 Weeks Before Interview
| 📅 1 Week Before Interview
|
📅 Day Before Interview
| 🎯 Best Questions to Ask Interviewer
|
| Experience Level | Average CTC (India) | Top Cities | Top Companies |
|---|---|---|---|
| Intern / Fresher (0-1 yr) | ₹3 – ₹6 LPA | Bangalore, Hyderabad, Pune | TCS, Wipro, Infosys, startups |
| Junior DA (1-3 yr) | ₹6 – ₹12 LPA | Bangalore, Mumbai, Delhi NCR | Amazon, Flipkart, Zomato, MNCs |
| Mid-level DA (3-6 yr) | ₹12 – ₹22 LPA | Bangalore, Hyderabad, Gurgaon | Google, Microsoft, PhonePe, Paytm |
| Senior DA (6-10 yr) | ₹22 – ₹40 LPA | Bangalore, Remote | FAANG, Unicorns, Consultancies |
| Lead / Principal DA (10+ yr) | ₹40 – ₹80+ LPA | Bangalore, Mumbai + Remote | Google, Meta, Goldman Sachs |
🛢️ Key SQL Concepts
• INNER / LEFT / RIGHT / FULL JOIN • GROUP BY + HAVING • Window Functions (ROW_NUMBER, RANK, LAG, SUM OVER) • CTEs vs Subqueries • Indexes — why they speed up queries • UNION vs UNION ALL • NULL handling (COALESCE, IS NULL) | 🐍 Key Python/pandas
• df.head(), df.info(), df.describe() • df.isnull().sum(), fillna(), dropna() • df.groupby().agg() • pd.merge() — left/inner/outer • df.pivot_table() • String methods: str.strip(), str.lower() • lambda + apply() | 📈 Key Stats Terms
• Mean vs Median vs Mode • Standard Deviation vs Variance • Correlation ≠ Causation • p-value, Type I & II errors • Normal, skewed distributions • Central Limit Theorem • A/B Testing — how to design |
Data Analyst Interview Mastery Guide
आज के समय में डेटा की भूमिका हर क्षेत्र में बढ़ गई है, इसलिए data analyst interview questions experienced पेशेवरों के लिए काफी चुनौतीपूर्ण हो सकते हैं। यदि आप एक नए करियर की शुरुआत कर रहे हैं, तो data analyst interview questions fresher को समझना बहुत जरूरी है ताकि आप बुनियादी अवधारणाओं को स्पष्ट कर सकें। कई कंपनियाँ शुरुआती स्तर पर data analyst interview questions intern पूछती हैं, जिनमें अक्सर डेटा क्लीनिंग और विज़ुअलाइज़ेशन पर ध्यान दिया जाता है। इसके अलावा, अधिकांश तकनीकी राउंड में data analyst interview questions excel एक महत्वपूर्ण हिस्सा होते हैं, क्योंकि एक्सेल डेटा हेरफेर का प्राथमिक टूल है।
तैयारी के दौरान आपको यह ध्यान रखना चाहिए कि data analyst interview questions experienced लेवल पर केस स्टडीज पर आधारित होते हैं। वहीं, data analyst interview questions fresher के लिए सांख्यिकी और लॉजिक पर अधिक जोर दिया जाता है। इंटर्नशिप के लिए data analyst interview questions intern अक्सर आपकी सीखने की क्षमता और टूल्स की समझ को परखते हैं। अंत में, चाहे आप किसी भी स्तर पर हों, data analyst interview questions excel में महारत हासिल करना आपकी सफलता की संभावनाओं को बढ़ा देता है।
Essential Skills and Preparation Strategy
जब आप किसी data analyst interview questions position के लिए आवेदन करते हैं, तो सबसे पहला सवाल यह आता है कि how do i prepare for a data analyst interview? इसकी शुरुआत basic data analyst interview questions को हल करने से होनी चाहिए, जैसे कि डेटा माइनिंग और क्लीनिंग की परिभाषाएँ। साक्षात्कारकर्ता अक्सर आपसे पूछ सकते हैं कि why do you want to be a data analyst, जहाँ आपको डेटा के प्रति अपने जुनून और समस्या सुलझाने के कौशल को दिखाना होता है। आपकी मदद के लिए data analyst interview questions and answers 2026 की नवीनतम सूची बाज़ार के वर्तमान रुझानों के आधार पर तैयार की जानी चाहिए।
तकनीकी कौशल की बात करें तो SQL interview questions data analyst राउंड का सबसे कठिन हिस्सा हो सकते हैं, जिसमें जॉइन्स और सबक्वेरीज़ पर ध्यान दिया जाता है। इसके साथ ही, Python data analyst interview के दौरान पांडा (Pandas) और नंपी (NumPy) जैसी लाइब्रेरीज़ की जानकारी होना अनिवार्य है। डेटा के पीछे के गणित को समझने के लिए statistics interview questions data analyst सेक्शन में हाइपोथेसिस टेस्टिंग और प्रोबेबिलिटी की अच्छी समझ होनी चाहिए। सही तैयारी और data analyst interview questions and answers 2026 के अभ्यास से आप किसी भी data analyst interview questions position को आसानी से क्रैक कर सकते हैं।
❓ Top Data Analyst Interview Questions — FAQ
How many rounds does a Data Analyst interview typically have? +
Is SQL mandatory for data analyst interviews? +
What should a fresher include in their data analyst portfolio? +
What’s the biggest mistake candidates make in data analyst interviews? +
🔗 Related Career Guides
🚀 Land Your Data Analyst Role in 2026!
Practice daily, build your portfolio, and stay updated on job openings — UPSarkariJob.com has it all.
🏠 Home Page 🔔 Free Job Alert