SQL Window Functions for Data Analyst Interviews: A Practical Guide
Why window functions are interview-critical
Window functions separate junior data analyst candidates from mid-to-senior ones. Almost every company that interviews data analysts at the mid-to-senior level will test window functions — not as a trick question, but because window functions reflect how real analytics work is done. Understanding when and how to use them signals that you've written production analytics code, not just LeetCode solutions.
The good news: there are four patterns that cover roughly 90% of window function interview questions. Master these four, and you'll handle almost anything an interviewer throws at you.
The syntax to know before anything else
SELECT
column,
FUNCTION() OVER (
PARTITION BY partition_column
ORDER BY order_column
ROWS/RANGE BETWEEN frame_start AND frame_end
) AS alias
FROM table;
The OVER clause is what makes a function a window function. PARTITION BY divides the data into groups (like GROUP BY, but without collapsing rows). ORDER BY determines the sequence within each partition. The ROWS BETWEEN clause defines how large the "window" of rows is.
Pattern 1: Ranking
Use case: "Find the top 3 products by revenue in each category."
SELECT *
FROM (
SELECT
category,
product_name,
revenue,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY revenue DESC
) AS rank
FROM products
) ranked
WHERE rank <= 3;
When interviewers ask about this, they're also watching:
- Do you know the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?
- ROW_NUMBER() — unique sequential number, no ties
- RANK() — ties get the same rank, next rank skips (1, 1, 3)
- DENSE_RANK() — ties get the same rank, no gap (1, 1, 2)
The correct choice depends on how you want to handle ties. Always ask the interviewer: "Should tied values share a rank?" If they say yes, use DENSE_RANK(). If you want exactly N results, use ROW_NUMBER().
Pattern 2: Running totals and moving averages
Use case: "Calculate a 7-day rolling average of daily active users."
SELECT
date,
dau,
AVG(dau) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS dau_7day_avg
FROM daily_active_users;
The frame clause ROWS BETWEEN 6 PRECEDING AND CURRENT ROW is what creates the 7-day window (the current row plus the 6 preceding rows). For a simple running total:
SUM(revenue) OVER ( ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_revenue
Interviewers often follow up with: "What's the difference between ROWS and RANGE?" ROWS refers to physical rows. RANGE refers to logical values — if two rows have the same ORDER BY value, RANGE treats them as part of the same window boundary, ROWS does not. For most analytics use cases, ROWS is what you want.
Pattern 3: LAG and LEAD for period-over-period comparisons
Use case: "Calculate week-over-week revenue change for each product."
SELECT
week,
product_id,
revenue,
LAG(revenue, 1) OVER (
PARTITION BY product_id
ORDER BY week
) AS prev_week_revenue,
revenue - LAG(revenue, 1) OVER (
PARTITION BY product_id
ORDER BY week
) AS wow_change
FROM weekly_revenue;
LAG looks backwards (previous row), LEAD looks forward (next row). Both take an optional offset (default 1) and a default value for when there's no previous/next row.
Common interview extension: "How would you calculate the percentage change?" Add a CASE statement to handle division by zero:
CASE WHEN LAG(revenue) OVER (...) = 0 THEN NULL ELSE (revenue - LAG(revenue) OVER (...)) / LAG(revenue) OVER (...) * 100 END AS wow_pct_change
Pattern 4: Cohort retention analysis
Retention analysis is the most complex window function pattern — and the most impressive to demonstrate in an interview.
Use case: "Calculate 30-day retention for each weekly signup cohort."
WITH first_activity AS (
SELECT
user_id,
MIN(DATE_TRUNC('week', activity_date)) AS cohort_week
FROM user_activity
GROUP BY user_id
),
cohort_activity AS (
SELECT
f.user_id,
f.cohort_week,
DATEDIFF(a.activity_date, f.cohort_week) AS days_since_signup
FROM first_activity f
JOIN user_activity a ON f.user_id = a.user_id
)
SELECT
cohort_week,
COUNT(DISTINCT CASE WHEN days_since_signup = 0 THEN user_id END) AS cohort_size,
COUNT(DISTINCT CASE WHEN days_since_signup BETWEEN 28 AND 35 THEN user_id END) AS retained_30d,
COUNT(DISTINCT CASE WHEN days_since_signup BETWEEN 28 AND 35 THEN user_id END) * 1.0 /
COUNT(DISTINCT CASE WHEN days_since_signup = 0 THEN user_id END) AS retention_rate_30d
FROM cohort_activity
GROUP BY cohort_week
ORDER BY cohort_week;
This pattern tests CTEs, date manipulation, conditional aggregation, and the ability to think through a business problem (what does "30-day retention" actually mean?) rather than just write syntax.
How to approach window function questions in interviews
- Clarify the data model first. Ask: "Is there one row per user per day, or could there be multiple events per user per day?" The answer changes whether you need to deduplicate before windowing.
- Write the PARTITION BY before the function. Identifying how to partition the data is the hardest conceptual step. Get it right first.
- Validate your logic out loud. Walk the interviewer through what a small example would produce. "If user A has activity on days 1, 3, and 7, my LAG would give NULL on day 1, day 1 on day 3, and day 3 on day 7..."
- Check for edge cases. NULL values in the ORDER BY column, users with only one event (LAG returns NULL), date gaps in rolling average windows.
Practising window functions for real interviews
The difference between knowing window function syntax and being fluent with it under interview pressure is significant. In a data analyst mock interview, you'll be asked window function questions with follow-ups — "what if there are ties?", "what if a user has no activity in a given week?", "how would this query scale on a 10TB table?" — that require you to reason about the SQL, not just recall it.
The best preparation is writing these queries from scratch, out loud, multiple times, with someone watching for the moments where your logic breaks down.