10 SQL Interview questions you must know

Preparing for a SQL interview can be a daunting task, especially for roles that demand a deep understanding of data manipulation, database design, and query optimization. Whether you’re aiming to become a data engineer, SQL developer, or data analyst, having a strong grasp of SQL fundamentals is crucial for success. In this blog post, we’ll walk you through the top 10 SQL interview questions that are frequently asked in interviews for these roles. These questions cover a range of topics from basic SQL operations to more advanced concepts like window functions and query optimization. By the end of this guide, you’ll have a solid foundation to confidently tackle your next SQL interview.

Question 1: What types of joins do you know?

Question 2: What is an index?

Question 3: What types of indexes do you know?

Question 4: What is normalization in SQL, and why use it?

Question 5: What is denormalization in SQL, and why use it?

Question 6: What are the differences between SUM(amount) and SUM(amount) OVER(PARTITION BY PRODUCT ORDER BY DATE ASC)?

Question 7: How would you write a query to calculate cumulative sales over the sales date?

Setup:


CREATE TABLE Sales (
    sale_id INT,
    amount DECIMAL(10, 2),
    sale_date DATE
);

INSERT INTO Sales (sale_id, amount, sale_date) VALUES
(1, 100.00, '2023-01-01'),
(2, 200.00, '2023-01-02'),
(3, 150.00, '2023-01-03');

Question 8: Given a table with a sequence or auto_increment primary key and a value field, delete the newest records from the table for records that have duplicate values.

Setup:


CREATE TABLE #my_table (
    id INT PRIMARY KEY,
    value VARCHAR(255)
);

INSERT INTO #my_table (id, value) VALUES
(1, 'A'),
(2, 'B'),
(3, 'A'),
(4, 'C'),
(5, 'A');

Question 9: Context: You have a table named `tree` with columns representing nodes and their corresponding parent nodes. Write an SQL query to label each node as either “Leaf,” “Inner,” or “Root.”

Setup:


CREATE TABLE #node_hierarchy (
    node INT,
    parent INT
);

INSERT INTO #node_hierarchy (node, parent) VALUES
(1, 2),
(2, 5),
(3, 5),
(4, 3),
(5, NULL);

Question 10: Finding sequences: Given a table with people counts per visit date, find sequences of at least 3 consecutive rows where the people count is greater than or equal to 100.

Setup:


CREATE TABLE #Stadium (
    id INT,
    visit_date DATE,
    people INT
);

INSERT INTO #Stadium (id, visit_date, people) VALUES
(1, '2023-01-01', 120),
(2, '2023-01-02', 150),
(3, '2023-01-03', 90),
(4, '2023-01-04', 130),
(5, '2023-01-05', 110);

Solutions:

Solution 1: Types of Joins

INNER JOIN: Returns only those records that satisfy a defined join condition in both (or all) tables. It’s the default SQL join.
LEFT (OUTER) JOIN: Returns all records from the left table and those records from the right table that satisfy a defined join condition.
RIGHT (OUTER) JOIN: Returns all records from the right table and those records from the left table that satisfy a defined join condition.
FULL (OUTER) JOIN: Returns all records from both (or all) tables. It can be considered a combination of left and right joins.

Solution 2: What is an Index?

An index is a special data structure related to a database table and used for storing its important parts and enabling faster data search and retrieval. Indexes are especially efficient for large databases, where they significantly enhance query performance.

Solution 3: Types of Indexes

Unique Index: Doesn’t allow duplicates in a table column and hence helps maintain data integrity.
Clustered Index: Defines the physical order of records of a database table and performs data searching based on the key values. A table can have only one clustered index.
Non-clustered Index: Keeps the order of the table records that doesn’t match the physical order of the actual data on the disk. It means that the data is stored in one place and a non-clustered index – in another one. A table can have multiple non-clustered indexes.

Solution 4: Normalization in SQL

Normalization is a process of database design that includes organizing and restructuring data in a way to reduce data redundancy, dependency, duplication, and inconsistency. This leads to enhanced data integrity, more tables within the database, more efficient data access and security control, and greater query flexibility.

Solution 5: Denormalization in SQL

Denormalization is the process opposite of normalization: it introduces data redundancy and combines data from multiple tables. Denormalization optimizes the performance of the database infrastructure in situations when read operations are more important than write operations since it helps avoid complex joins and reduces the time of query running.

Solution 6: SUM vs SUM OVER

SUM(amount): Reduces the number of rows to one per group (or a single row for the entire set if no GROUP BY is used).
SUM(amount) OVER: Does not reduce the number of rows; instead, it returns a value for each row based on the defined window.

Solution 7: Cumulative Sales Query


SELECT 
    sale_id, 
    amount, 
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM 
    Sales;

Solution 8: Delete Newest Records with Duplicate Values


DELETE a
FROM #my_table a
JOIN #my_table b
ON a.value = b.value 
AND a.id > b.id
AND b.id = (
    SELECT MIN(id) 
    FROM #my_table 
    WHERE value = a.value
);

Solution 9: Labeling Nodes in a Hierarchy




--recursive cte solution
;with cte as (

select node,parent,0 as [level] from #node_hierarchy t
where parent is null

union all 

select t.node,t.parent,c.[level] + 1  from #node_hierarchy t
inner join cte c on c.node = t.parent  
where level < 100

)

select node,case when level = 2 then 'Leaf' 
				 when level = 1 then 'Inner'
				else 'Root' 
				end as [label] from cte
order by 1

-- Complex case statement solution 
SELECT
    node,
    CASE
        WHEN parent IS NULL THEN 'Root'
        WHEN node NOT IN (SELECT parent FROM #node_hierarchy WHERE parent IS NOT NULL) THEN 'Leaf'
        ELSE 'Inner'
    END AS label
FROM
    #node_hierarchy;

Solution 10: Finding Sequences with Consecutive Counts





WITH ConsecutiveGroups AS (
    SELECT 
        id, 
        visit_date, 
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS grp
    FROM #Stadium
    WHERE people >= 100
),
GroupedRows AS (
    SELECT 
        grp,
        COUNT(*) AS cnt
    FROM ConsecutiveGroups
    GROUP BY grp
    HAVING COUNT(*) >= 3
)
SELECT 
    c.id, 
    c.visit_date, 
    c.people
FROM ConsecutiveGroups c
JOIN GroupedRows g ON c.grp = g.grp
ORDER BY c.visit_date;

Conclusion

Mastering these top 10 SQL interview questions will equip you with the knowledge and confidence needed to excel in your next interview. Whether you're a data engineer, SQL developer, or data analyst, understanding these core concepts is key to demonstrating your expertise and securing your desired role. Good luck!

10 SQL Interview questions you must know

Solutions:

Solution 1: Types of Joins

Solution 2: What is an Index?

Solution 3: Types of Indexes

Solution 4: Normalization in SQL

Solution 5: Denormalization in SQL

Solution 6: SUM vs SUM OVER

Solution 7: Cumulative Sales Query

Solution 8: Delete Newest Records with Duplicate Values

Solution 9: Labeling Nodes in a Hierarchy

Solution 10: Finding Sequences with Consecutive Counts

Conclusion

Leave a Reply Cancel reply