Complex SQL Queries: PIVOT and CONNECT BY

For those already familiar with SQL and basic query writing, the next step is mastering advanced techniques to handle intricate data challenges and unlock the full potential of Oracle database. While simple queries are good enough for basic tasks, real-world scenarios often demand more sophisticated techniques to tackle complex data relationships, extract deeper insights, and prepare for professional growth. In this article, we'll explore complex SQL queries through practical examples, focusing on techniques that enhance your ability to solve challenging problems. Let's get started!

Pivoting and Unpivoting Rows

In the context of data manipulation and representation, pivoting refers to the process of transforming a dataset by reorganizing its structure. This transformation involves changing the arrangement of rows and columns to create a new representation of the data. The goal is to reshape the data to make it more suitable for presentation.

In a typical pivoting operation, you may have a dataset where information is stored in rows, and you want to reorganize it so that the data is presented in columns. While pivoting is more commonly associated with tools, such as Microsoft Excel, Pandas, or dplyr, Oracle also supports the process with the PIVOT operation

In Oracle SQL, the PIVOT operation is used to transform rows into columns, effectively pivoting the data based on some criteria. Here is a basic syntax for using the PIVOT clause in Oracle SQL:

SELECT *
FROM (
-- Your initial query here
SELECT column1, column2, column3
FROM your_table
)
PIVOT (
-- Aggregate function and pivot column
AGGREGATE_FUNCTION(column_value)
FOR pivot_column IN ('value1' AS alias1, 'value2' AS alias2, ...)
);

Let's break down the components of this syntax:

  1. Initial Query
    • The initial query retrieves the data that you want to pivot. It selects the columns you need from the source table (your_table in the basic syntax description above.)
  2. PIVOT Clause
    • The PIVOT clause is used to perform the pivoting operation. It consists of an aggregate function, a pivot column, and a list of values for that pivot column.
    • Aggregate Function:
      • This is a function like SUM, COUNT, AVG, etc., that will be applied to the values in the pivot column during the pivoting operation.
    • Pivot Column:
      • This is the column whose values will become the new columns in the pivoted result set.
    • IN Clause
      • The IN clause specifies the distinct values in the pivot column that you want to transform into columns.
    • FOR Clause
      • The FOR clause is used to provide aliases for the values in the pivot column. These aliases will be used as the new column headers in the result set.

Let's consider a table shown below. The table below has 67 rows and 3 columns: product_category, salesperson, and quantity_sold. Suppose that you want to represent this table, so that each value of the product_category goes to the column, while salesperson and the total quantity goes to rows. This process is called pivoting table from long to wide format.


The SQL query below is summarizing sales data by product_category and salesperson. The subquery selects three columns from the sales_data table. Essentially, this part extracts relevant sales information from the original table. 

Then the PIVOT clause pivots the data from the subquery based on the product_category column. The SUM(quantity_sold) calculates the sum of quantity_sold for each combination of product_category and salespersonFOR product_category IN specifies the column to be used for pivoting (product_category). Lastly, 'Electronics' AS "Total Sales: Electronics", ... defines the categories to be used as new columns and their corresponding names. In this case, it creates three new columns named "Total Sales: Electronics", "Total Sales: Clothing", "Total Sales: Books".

So, the PIVOT clause takes the aggregated sales data by salesperson and product category and transforms it into a table where each row represents a salesperson, and each column represents the total sales for a specific product category. In simpler terms, the query below rearranges the sales data from showing individual sale to showing total sales for each product category by salesperson.


Conversely, unpivoting is a process of converting the column-wise representation of data into a row-wise representation. In other words, it is undoing what we have done in the pivoting process. This is commonly done to make the data more suitable for further analysis later.

Here is a basic syntax for the UNPIVOT operation in Oracle SQL:

SELECT
    [Identifier_column],
    [Unpivoted_column],
    [Value_column]
FROM (
    -- Your subquery here
    SELECT
        [Identifier_column],
        [column1],
        [column2],
        ...
FROM [your_table]
)
UNPIVOT (
    ([Value_column] FOR [Unpivot_column] IN ([column1], [column2], ...)
);

Here's a step-by-step explanation:

  1. SELECT from Subquery:
    • The UNPIVOT operation is applied to subquery that selects the columns you want to unpivot.
  2. Identifier Column:
    • The [Identifier_column] is a column that uniquely identifies each row. This column remains unchanged during the unpivoting process.
  3. Column to Unpivot:
    • The columns, [column1], [column2], ... are the columns you want to unpivot.
  4. UNPIVOT Clause:
    • The UNPIVOT clause is used to unpivot the selected columns.
    • [Value_column] is the name of the new column that will store the values.
    • FOR [Unpivoted_column] IN ([column1], [column2], ...) specifies the columns to unpivot and their corresponding names in the [Unpivoted_column] and [Value_column] result columns.

Let's illustrate the concept with an example. First, I created a view with the pivoting query mentioned earlier. Then, you can unpivot the pivoted data as follows:


In this example, the unpivoted table includes 9 rows with 3 columns: salesperson, sales_category, sales_quantity. The newly created sales_category column has values 'Total Sales: Electronics', 'Total Sales: Clothing', and 'Total Sales: Books' corresponding to the original columns being unpivoted. The sales_quantity column contains the unpivoted values.



Hierarchical Queries


Post a Comment

0 Comments