In this article we will walk you through the GROUP BY clause that operates on groups of rows. You will also learn to compute aggregates that apply to these groups.
Problem :
You need to perform an analysis that requires data aggregation for understanding of outcomes.
Solution
Pre-requisites
Before going through this article we strongly recommend that you go through the following articles
1. Getting started with SQL for Impact Measurement
Purpose
In order to group similar data into sets and return a single value for each group, the GROUP BY clause is used. This clause is used in conjunction with the SELECT statement to group together dataset rows that contain identical data.
The columns to be retrieved are specified in the SELECT statement and separated by commas. Any of the aggregate functions can be used on one or more than one of the columns being retrieved.
To group the result set by one or more columns, the GROUP BY clause is frequently used in conjunction with the aggregate functions COUNT(), MAX(), MIN(), SUM(), and AVG().
Syntax of GROUP BY CLAUSE IS AS FOLLOWS:
SELECT column1, column2
FROM table_name
WHERE [ conditions ]
GROUP BY column1, column2
HAVING [conditions]
ORDER BY column1, column2
NOTE :
Important Points to remember
- GROUP BY clause is used with the SELECT statement.
- In the query, GROUP BY clause is placed after the WHERE clause.
- In the query, GROUP BY clause is placed before ORDER BY clause if used any.
- In the query , Group BY clause is placed before Having clause .
- Place condition in the HAVING clause
Aggregate Functions
The five aggregate functions that we can use with the SELECT statement are:
- AVG(): Calculates the average of the set of values.
- COUNT(): Returns the count of rows.
- SUM(): Calculates the arithmetic sum of the set of numeric values.
- MAX(): From a group of values, returns the maximum value.
- MIN(): From a group of values, returns the minimum value.
Let's understand how to use GROUP BY Clause
Let us consider the Survey table that is shown below to understand GROUP BY clause.
id | name | Did you get a job after completing the course? | position | What is your annual compensation? | How satisfied are you with the following - Fell supported at work | How satisfied are you with the following- Facilities at work | How satisfied are you with the following-Feel safe at work | Are you actively interviewing right now? |
1 | Madhukar Prabhakara | Yes | Data Analyst | 70000 | Somewhat satisfied | Not Satisfied | Very Satisfied | No |
2 | Lore Berron Cadenas | Yes | Software Engineer | 60000 | Not Satisfied | Not Satisfied | Very Satisfied | No |
3 | Latika Bahadur (Sopact) | Yes | Software Engineer | 90000 | Very Satisfied | Not Satisfied | Very Satisfied | No |
4 | Vaishnavi | Yes | Software Engineer | 83000 | Very Satisfied | Not Satisfied | Neutral | No |
5 | HETAL SHETH | Yes | Software Engineer | 72000 | Neutral | Not Satisfied | Neutral | No |
6 | Ritvik | No | 0 | No | ||||
7 | Unmesh Sheth | No | 0 | No | ||||
8 | Shiji | No | 0 | Yes | ||||
9 | Ricardo Pinho | No | 0 | Yes | ||||
10 | Lorena | No | 0 | No | ||||
11 | Farhan | No | 0 | Yes | ||||
12 | Richa | No | 0 | No | ||||
13 | Phoram | No | 0 | Yes | ||||
14 | Arpitha | No | 0 | Yes | ||||
15 | Adriano | No | 0 | No | ||||
16 | Caroline | No | 0 | No | ||||
17 | Samantha | No | 0 | Yes | ||||
18 | Lily | No | 0 | No |
Examples
1. To know total number of people in each position, then GROUP BY query would be as follows −
SELECT position, count("position") FROM survey GROUP BY position
OR
SELECT position, count(*) FROM survey GROUP BY position
TIP : To adapt to your dataset, replace "position" column with a column from your dataset. Replace "survey" dataset with your dataset as well. Don't forget to add the column name to the GROUP BY clause.Include the aggregate function on the correct column.
* (asterisk) in count indicates the all or everything in the grouped column.
The output should be as follows :
position | count |
Software Engineer | 4 |
Data Analyst | 1 |
NULL | 13 |
NOTE : This statement will group together the identical data in the position column(grouped as 1, 2, & 3 based on Null, Data Analyst and Software Engineer) in the dataset as shown below. So, this statement will first group them together then count the number of identical data in each group which gives the above result.
2. To know total annual compensation in each position, then the GROUP BY query would be as follows -
SELECT position,SUM("What is your annual compensation?") FROM survey
group by position
TIP : Replace "position" column with a column from your dataset. Replace "survey" dataset with your dataset as well. Don't forget to add the column name to the GROUP BY clause.Include the aggregate function on the correct column.
The output would be as follows :
position | sum |
NULL | 0 |
Software Engineer | 305000 |
Data Analyst | 70000 |
NOTE : It will group together the identital data in the position column(grouped as 1, 2, & 3 based on Null, Data Analyst and Software Engineer)) in the dataset as shown above in the fig. So, this statement will first group them together then perform sum on corresponding "What is your annual compensation? column - adding all the value of foresaid column in each group which gives the above result.
3. To know Maximum annual compensation in each position, then the GROUP BY query would be as follows -
SELECT position,MAX("What is your annual compensation?") FROM survey
group by position
The output would be as follows :
position | max |
Software Engineer | 90000 |
Data Analyst | 70000 |
NULL | 0 |
NOTE : The identical data in the position column of the dataset will be grouped together as group (1,2 & 3 based on Null, Data Analyst and Software Engineer)). So, this statement will first group them together then compute the Maximum on corresponding "What is your annual compensation? column - display maximum value of the foresaid column in each group which gives the above result.
4. To know Minimum and the Maximum annual compensation in each position, then the GROUP BY query would be as follows -
SELECT position,MAX("What is your annual compensation?"),
MIN("What is your annual compensation?") FROM survey
group by position
TIP : You can add multiple aggregate function on GROUP BY.
The output would be as follows :
position | max | min |
NULL | 0 | 0 |
Software Engineer | 90000 | 60000 |
Data Analyst | 70000 | 70000 |
NOTE : The identical data in the position column of the dataset will be grouped together into 3 groups ( 1, 2 and 3 based on Null, Data Analyst and Software Enginerr) So, this statement will first group position column together then compute the Maximum and Minimum on corresponding "What is your annual compensation? column - display maximum and minimum value of the foresaid column in each group which gives the above result.
5. To know total respondent to the survey , as well as learn highest and lowest compensation, so the SELECT statement should be like this :
SELECT count(*), max("What is your annual compensation?"),
max("What is your annual compensation?") from survey
TIP : When a aggregate function is used without a GROUP BY clause, then aggregation function computes on the entire dataset irrespective of any column. The SELECT statement can list ONLY columns that are used in GROUP BY along with these funcitons.
Count(*) - * in asterisk indicates all or everything in the group that is considered.
The output would be as follows :
count | max | min |
18 | 90000 | 0 |
NOTE : This SELECT statement first considers the entire dataset as one group as shown below and computes the COUNT, MAX and MIN on the entire dataset which displays the above result.
6. To learn how many of the respondents to the survey have found employment, and also know position wise average compensation, so the GROUP BY statement would like this
SELECT
"Did you get a job after completing the course?",
position,
AVG ("What is your annual compensation?")
FROM survey
group by "Did you get a job after completing the course?",position
TIP : You can GROUP BY multiple columns
The output would be as follows :
Did you get a job after completing the course? | position | avg |
No | NULL | 0 |
Yes | Software Engineer | 76250 |
Yes | Data Analyst | 70000 |
A small tip : If there are three groups, then 3 records. Two groups then 2 records .....
In this case, we have 3 groups - A.1, B.1, B.2 so three records.
NOTE : It will group together the identital data in the "Did you get a job after completing the course?" column( grouped as A and B based on Yes or No) and then group the position column (A.1, B.1 & B.2 based on Null, Data Analyst and Software Engineer) in the dataset as shown below. Based on identical values in each group (A.1, B.1 & B.2), the average on compensation column is computed which gives the above result. It looks as shown below:
NOTE : To adapt the GROUP BY clause to your use case. Replace column name and dataset name according to your dataset. Kindly adhere to the syntax to avoid errors.Don't forget to add the column name to the GROUP BY clause if needed. Include the aggregate function on the correct column for computations.
Let's practice an example of GROUP BY on Impact Cloud Dashboard
Login to Impact Cloud instance using the credentials. For this example, we will be using Impact Cloud Demo instance.
Go to SQL Lab, Select SQL Editor as shown below:
- Let's learn how many of the respondents to the survey have found employment, and also know position wise average compensation on Impact Cloud Dashboard.
To do this, write / copy paste the below query on the SQL Editor as shown below. Click RUN to execute the query to display the results.
SELECT
"Did you get a job after completing the course?",
position,AVG ("What is your annual compensation?")
FROM gc_demo.survey
group by "Did you get a job after completing the course?",position
NOTE: Impact Cloud's dataset is identified by its schema name. dataset, therefore gc_demo.survey is used instead of survey. gc_demo is the schema where the courses dataset resides.
TO DO :
1. Execute all the above examples of GROUP BY clause with SELECT Statement to get familiar with Impact Cloud.
2. Adapt the SELECT Statement with GROUP BY clause to suit your requirement.
Congratulations !! You have learnt to use GROUP BY clause.