Preparing for a SQL interview can be a daunting task, especially for roles that demand a deep understanding of data manipulation, database design, and query optimization. Whether you’re aiming to become a data engineer, SQL developer, or data analyst, having a strong grasp of SQL fundamentals is crucial for success. In this blog post, we’ll walk you through the top 10 SQL interview questions that are frequently asked in interviews for these roles. These questions cover a range of topics from basic SQL operations to more advanced concepts like window functions and query optimization. By the end of this guide, you’ll have a solid foundation to confidently tackle your next SQL interview.
Question 1: What types of joins do you know?
Question 2: What is an index?
Question 3: What types of indexes do you know?
Question 4: What is normalization in SQL, and why use it?
Question 5: What is denormalization in SQL, and why use it?
Question 6: What are the differences between SUM(amount)
and SUM(amount) OVER(PARTITION BY PRODUCT ORDER BY DATE ASC)
?
Question 7: How would you write a query to calculate cumulative sales over the sales date?
Setup:
CREATE TABLE Sales (
sale_id INT,
amount DECIMAL(10, 2),
sale_date DATE
);
INSERT INTO Sales (sale_id, amount, sale_date) VALUES
(1, 100.00, '2023-01-01'),
(2, 200.00, '2023-01-02'),
(3, 150.00, '2023-01-03');
Question 8: Given a table with a sequence or auto_increment primary key and a value field, delete the newest records from the table for records that have duplicate values.
Setup:
CREATE TABLE #my_table (
id INT PRIMARY KEY,
value VARCHAR(255)
);
INSERT INTO #my_table (id, value) VALUES
(1, 'A'),
(2, 'B'),
(3, 'A'),
(4, 'C'),
(5, 'A');
Question 9: Context: You have a table named `tree` with columns representing nodes and their corresponding parent nodes. Write an SQL query to label each node as either “Leaf,” “Inner,” or “Root.”
Setup:
CREATE TABLE #node_hierarchy (
node INT,
parent INT
);
INSERT INTO #node_hierarchy (node, parent) VALUES
(1, 2),
(2, 5),
(3, 5),
(4, 3),
(5, NULL);
Question 10: Finding sequences: Given a table with people counts per visit date, find sequences of at least 3 consecutive rows where the people count is greater than or equal to 100.
Setup:
CREATE TABLE #Stadium (
id INT,
visit_date DATE,
people INT
);
INSERT INTO #Stadium (id, visit_date, people) VALUES
(1, '2023-01-01', 120),
(2, '2023-01-02', 150),
(3, '2023-01-03', 90),
(4, '2023-01-04', 130),
(5, '2023-01-05', 110);
Solutions:
Solution 1: Types of Joins
INNER JOIN: Returns only those records that satisfy a defined join condition in both (or all) tables. It’s the default SQL join.
LEFT (OUTER) JOIN: Returns all records from the left table and those records from the right table that satisfy a defined join condition.
RIGHT (OUTER) JOIN: Returns all records from the right table and those records from the left table that satisfy a defined join condition.
FULL (OUTER) JOIN: Returns all records from both (or all) tables. It can be considered a combination of left and right joins.
Solution 2: What is an Index?
An index is a special data structure related to a database table and used for storing its important parts and enabling faster data search and retrieval. Indexes are especially efficient for large databases, where they significantly enhance query performance.
Solution 3: Types of Indexes
Unique Index: Doesn’t allow duplicates in a table column and hence helps maintain data integrity.
Clustered Index: Defines the physical order of records of a database table and performs data searching based on the key values. A table can have only one clustered index.
Non-clustered Index: Keeps the order of the table records that doesn’t match the physical order of the actual data on the disk. It means that the data is stored in one place and a non-clustered index – in another one. A table can have multiple non-clustered indexes.
Solution 4: Normalization in SQL
Normalization is a process of database design that includes organizing and restructuring data in a way to reduce data redundancy, dependency, duplication, and inconsistency. This leads to enhanced data integrity, more tables within the database, more efficient data access and security control, and greater query flexibility.
Solution 5: Denormalization in SQL
Denormalization is the process opposite of normalization: it introduces data redundancy and combines data from multiple tables. Denormalization optimizes the performance of the database infrastructure in situations when read operations are more important than write operations since it helps avoid complex joins and reduces the time of query running.
Solution 6: SUM vs SUM OVER
SUM(amount): Reduces the number of rows to one per group (or a single row for the entire set if no GROUP BY is used).
SUM(amount) OVER: Does not reduce the number of rows; instead, it returns a value for each row based on the defined window.
Solution 7: Cumulative Sales Query
SELECT
sale_id,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM
Sales;
Solution 8: Delete Newest Records with Duplicate Values
DELETE a
FROM #my_table a
JOIN #my_table b
ON a.value = b.value
AND a.id > b.id
AND b.id = (
SELECT MIN(id)
FROM #my_table
WHERE value = a.value
);
Solution 9: Labeling Nodes in a Hierarchy
--recursive cte solution
;with cte as (
select node,parent,0 as [level] from #node_hierarchy t
where parent is null
union all
select t.node,t.parent,c.[level] + 1 from #node_hierarchy t
inner join cte c on c.node = t.parent
where level < 100
)
select node,case when level = 2 then 'Leaf'
when level = 1 then 'Inner'
else 'Root'
end as [label] from cte
order by 1
-- Complex case statement solution
SELECT
node,
CASE
WHEN parent IS NULL THEN 'Root'
WHEN node NOT IN (SELECT parent FROM #node_hierarchy WHERE parent IS NOT NULL) THEN 'Leaf'
ELSE 'Inner'
END AS label
FROM
#node_hierarchy;
Solution 10: Finding Sequences with Consecutive Counts
WITH ConsecutiveGroups AS (
SELECT
id,
visit_date,
people,
id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM #Stadium
WHERE people >= 100
),
GroupedRows AS (
SELECT
grp,
COUNT(*) AS cnt
FROM ConsecutiveGroups
GROUP BY grp
HAVING COUNT(*) >= 3
)
SELECT
c.id,
c.visit_date,
c.people
FROM ConsecutiveGroups c
JOIN GroupedRows g ON c.grp = g.grp
ORDER BY c.visit_date;
Leave a Reply