SQL Window Functions
Window functions can be simply explained as calculation functions similar to aggregating, but where normal aggregating via the GROUP BY
clause combines then hides the individual rows being aggregated, window functions have access to individual rows and can add some of the attributes from those rows into the result set.
Have you ever wanted to loop through the results of your query to get a ranking, a top x list, or similar? Have you had any analytics projects where you wanted to prepare your data just right for a visualization tool, but found it nearly impossible or so complex that it wasn’t worth it?
Window functions can make things easier. After you get the result of your query — i.e., after the WHERE
clause and any standard aggregation, window functions will act on the remaining rows (the window of data) and get you what you want.
Some of the window functions we’re going to look at include:
OVER
COUNT()
SUM()
ROW_NUMBER()
RANK()
DENSE_RANK()
LEAD()
LAG()
Let’s explain with queries for that we have Employee table having following columns and Data.
Here I summarise a list of queries related to window functions
/*Print total salary*/
Here we can see the output in one row for the aggregate function sum.
select sum(salary) as TotalSalary from emp
/*Print total salary per job category*/
select job, sum(salary) as TotalSalary from emp group by job
The OVER
clause is what specifies a window function and must always be included in the statement. The default in an OVER
clause is the entire rowset.
We can see from below query for every row we are displaying TotalSalary
/*Print total salary with every row*/
select *, sum(salary) over ()as TotalSalary from emp
Partitions allow you to filter the window into sections by a certain value or values. Each section is often called the window frame.
To take it further, let’s say we are not only wanted to find out how many total salaries of the employee, but we want to show Total Salary for each Job title.
/*Print total salary per job category*/
select job,sum(salary) over (partition by job )as TotalSalary from emp
Now we want further order by Employee based on Salary. For that, we can use the familiar ORDER BY
clause. However, within a window function, ORDER BY
acts a bit differently than it does at the end of a query.
/*Order rows within partitions*/
select *, sum(salary) over (partition by job order by salary desc )as Ordered_job_Salary from emp
ROW_NUMBER
Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
ROW_NUMBER
and RANK
are similar. ROW_NUMBER
numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK
provides the same numeric value for ties (for example 1, 2, 2, 4, 5).
Ex-Sort the employee in sequence based on Salary with no ties.
/*Number rows within each partition*/
select *, ROW_NUMBER() over(partition by job order by salary desc) as “row_number” from emp;
Ex — Sort the employee in sequence based on Salary. Employee having same salary will be ties.
/*Rank rows within each partition using rank function*/
select *, ROW_NUMBER() over(partition by job order by salary) as “row_number”,RANK() over(partition by job order by salary) as “rank_row” from emp;
RANK VS DENSE_RANK
RANK gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped.DENSE_RANK again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.
/*Rank rows within each partition using dense_rank function*/
select *,
ROW_NUMBER() over(partition by job order by salary) as “row_number”,
RANK() over(partition by job order by salary) as “rank_row”,
DENSE_RANK() over(partition by job order by salary) as “dense_rank”
from emp;
NTILE
Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
/*Statistics using ntile function*/
select *,
NTILE(4) over(order by salary) as “quartile”
from emp;
/*Lead values*/
The LEAD function is an analytic function that lets you query more than one row in a table at a time without having to join the table to itself. It returns values from the next row in the table. To return a value from a previous row, try using the LAG function.
select *,
LEAD(salary, 1) Over(partition by job order by salary) as sal_next
from emp;
LAG
Accesses data from a previous row in the same result set without the use of a self-join starting with SQL Server 2012 (11.x). LAG provides access to a row at a given physical offset that comes before the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row.
/*Lag values*/
select *,
LAG(salary, 1) Over(partition by job order by salary) as sal_previous,
salary — LAG(salary, 1) Over(partition by job order by salary) as sal_diff
from emp;