Hello, SQL!

SQL, or Structured Query Language, is the industry-standard language for working with an relational database management system (RDBMS), including Oracle. It is not the database itself but a language used to design relational data models, manage database objects, and perform CRUD operations (Create, Read, Update, Delete) on data tables. SQL commands can be run independently or invoked by applications during runtime. Many popular programming languages, such as Java or Python, can embed SQL statements and issue them to interact with database.

Let's get started with some "Hello, World!" example. In the figure shown below, an SQL query is submitted, and a 1x1 table is returned. Because the statement starts with the keyword SELECT, it retrieves data from a specified table[1] and is called the SELECT statement. Likewise, every SQL statement begins with a specific keyword that declares what to do. 


Each SQL statement is composed of clauses, each beginning with its own keyword. In this example, the SELECT statement includes two clauses: SELECT and FROM. The SELECT clause specifies the columns in the resulting dataset; in this case, it contains a single dummy column with a single row containing the string 'Hello, SQL!'. 

The FROM clause defines the tables from which the columns listed in the SELECT clause are retrieved. This can involve a single table or view, or multiple tables joined together. The SELECT and FROM clauses are two mandatory components of a SELECT statement. Additional clauses, such as WHERE, GROUP BY, HAVING, and ORDER BY, can further extend the statement to refine the result set.

SQL syntax is case-insensitive, meaning the query, select 'Hello, SQL!' from dual;, works just as well as its uppercase equivalent. However, it is conventional to write keywords in uppercase and other elements, such as column or table names, in lowercase. This practice improves readability by clearly distinguishing keywords from identifiers.

Lastly, every SQL statement must end with a semicolon (;). Omitting it is a common error, even among experienced database programmers. Even though Oracle does not raise an error for omitting a closing semicolon on the very last SQL statement in the submitted batch script, it is considered a best practice to always include semicolons for better readability and to avoid potential errors.

Key Characteristics of SQL

SQL possesses several defining characteristics that make it powerful for managing and analyzing data.

Set-Based Language

SQL operates on entire sets of data simultaneously, rather than processing data row by row. For example, consider an SQL query like:

SELECT salary * (1 + commission_pct) AS total_salary
FROM employees
WHERE EXTRACT(YEAR FROM hire_date) = 2018;

This query returns a table containing a column with total salary of each employees whose hire dates is the year of 2018. When Oracle handles this query, it follows these steps:

  1. Identify the Data Table: Oracle identifies the employees table and temporarily holds all its rows in the result set.
  2. Apply the WHERE Clause: Using the EXTRACT(YEAR FROM hire_date) = 2018 condition, Oracle filters out rows in the result set where hire_date column value does not fall in the year 2018. Only rows matching the condition remain in the result set..
  3. Perform Column Operation: Apply the formula salary * (1 + commission_pct) to all rows in the result set.
  4. Return the Result: Finally, Oracle returns the processed result set as a table with a single column, total_salary, containing the calculated values for all filtered rows.

In SQL's set-based paradigm, the database engine processes the dataset as a whole and applies filtering and calculations to the entire set simultaneously. This behavior is very different from procedural approach, where an explicit iteration over rows is required to achieve the same result. For example, using Python, similar work could be done as follows:

# Example dataset: List of employee dictionaries
employees = [
{'hire_date': '2018-01-15', 'salary': 5000, 'commission_pct': 0.1},
{'hire_date': '2017-06-20', 'salary': 6000, 'commission_pct': 0.15},
{'hire_date': '2018-08-03', 'salary': 7000, 'commission_pct': 0.2},
{'hire_date': '2016-12-11', 'salary': 8000, 'commission_pct': 0.25}
]

# Step 1: Filter employees hired in 2018
filtered_employees = []
for employee in employees:
if employee['hire_date'].startswith('2018'):
filtered_employees.append(employee)

# Step 2: Calculate the average salary
total_salaries = []
for employee in filtered_employees:
total_salary = employee['salary'] * (1 + employee['commission_pct'])
total_salaries.append(total_salary)

# Output the results
for total_salary in total_salaries:
print(f"Total Salary: {total_salary:.2f}")

Declarative Language

SQL is also a declarative language, meaning it specifies what we want to achieve without detailing how the database should go about performing the task. For example, let's consider another example query as follows:

SELECT
dp.department_id AS dept_id,
dp.department_name AS dept_name,
ROUND(AVG(ee.salary),2) AS avg_salary
FROM
employees ee
JOIN departments dp ON ee.department_id = dp.department_id
GROUP BY
dp.department_id,
dp.department_name;

In this query:

  • The SELECT clause specifies desired result: a table with three columns--dept_id, containing unique identifier for each department, dept_name containing the name of the department, and avg_salary, containing the average salary of the department
  • The FROM and JOIN clause declare the tables involved (employees and departments) and condition linking them (ee.department_id = dp.department_id).
  • GROUP BY clause groups the data by department ID and name, enabling the calculation of average salaries per department.

The query does not specify how the database should retrieve or combine the data, such as which indexes to use, the order of operations, or the join algorithm. These details are abstracted, and the database engine determines the optimal execution plan to retrieve and process the data efficiently.

On the other hand, in a procedural approach, the task would involve explicitly iterating through the data to calculate the average salary for each department. Here's an example of how this could be done in Python:

# Example dataset: List of employee and department dictionaries
employees = [
{'employee_id': 1, 'department_id': 101, 'salary': 5000},
{'employee_id': 2, 'department_id': 102, 'salary': 6000},
{'employee_id': 3, 'department_id': 101, 'salary': 5500},
{'employee_id': 4, 'department_id': 103, 'salary': 7000}
]

departments = [
{'department_id': 101, 'department_name': 'HR'},
{'department_id': 102, 'department_name': 'Finance'},
{'department_id': 103, 'department_name': 'Engineering'}
]

# Step 1: Join employees with departments based on department_id
joined_data = []
for employee in employees:
for department in departments:
if employee['department_id'] == department['department_id']:
joined_data.append({
'dept_id': department['department_id'],
'dept_name': department['department_name'],
'salary': employee['salary']
})

# Step 2: Group data by department_id and calculate average salary
department_totals = {}
for record in joined_data:
dept_id = record['dept_id']
if dept_id not in department_totals:
department_totals[dept_id] = {'dept_name': record['dept_name'], 'total_salary': 0, 'count': 0}
department_totals[dept_id]['total_salary'] += record['salary']
department_totals[dept_id]['count'] += 1

# Step 3: Calculate the average salary for each department
result = []
for dept_id, data in department_totals.items():
avg_salary = round(data['total_salary'] / data['count'], 2)
result.append({'dept_id': dept_id, 'dept_name': data['dept_name'], 'avg_salary': avg_salary})

# Output the result
for row in result:
print(row)

In this example, notice the explicit iteration through the dataset, with each operation--joining, grouping, and calculating--carried out step by step. The developer is responsible for implementing all the data processing logic, resulting in longer and more complex code.

In contrast, SQL abstracts these operations into a concise declarative statement, delegating the implementation of the business logic to the database engine. Developers only need to specify what they want, rather than how to achieve it. This abstraction makes SQL particularly well-suited for efficiently handling large datasets, enabling simpler, more efficient, and more maintainable solutions compared to the verbose manual nature of procedural approaches.

Types of SQL Statements

SQL statements can be broadly categorized into Data Definition Language (DDL), Data Manipulation Language (DML), and Transaction Control Language (TCL). 

DDL Statements[2]: Used to create and manage database objects, such as tables, views, and users.

  • CREATE: Creates a user, table, view, index, synonym or other objects in the database.
  • ALTER: Used to modify the structure, name, or other attributes of an existing database object[3].
  • DROP: Removes a database object from the database that has already been created with the CREATE statement.
  • RENAME: Changes the name of an existing database object.
  • TRUNCATE: Removes all records from an existing table. 
  • GRANT: Provides privileges to user objects and enables them to perform specific tasks in the database.
  • REVOKE: Removes privileges that have been issued with the GRANT statement.
  • FLASHBACK: Restores an earlier version of a table or database.
  • PURGE: Irrevocably removes database objects from the recycle bin.
  • COMMENT: Add comments to the data dictionary for an existing database object.

DML Statements: Retrieve and manipulate data records through tables and views. 

  • SELECT: Retrieves data from a table or view
  • INSERT: Adds data records to a database table, either directly or through a view.
  • UPDATE: Modifies existing data records in a table, either directly or through a view.
  • DELETE: Removes existing data records in a table, either directly or through a view. 
  • MERGE: Performs a combinations of INSERT, UPDATE, and DELETE statements in a single statement, depending on some condition.

TCL Statements: Manage transactions, ensuring data integrity and avoiding data anomalies.

  • COMMIT: Saves a set of DML modifications performed in the current database session.
  • ROLLBACK: Undoes a set of DML modifications performed during the current database session.
  • SAVEPOINT: Marks a point in a session to prepare for a future ROLLBACK to enable ROLLBACK to restore data at the selected point in a session.

Lexical Elements of SQL

SQL statements are composed from the four lexical elements: literals, operators, reserved words, and identifiers.

Literals

Literals are raw values that are directly provided by developers in the SQL scripts. They can be numbers, strings, or NULL:

  • Number Literals
    • Number literals represent any integer or rational numbers, such as 5, -20, or 3.14. These are often used in arithmetic operations or comparisons.
  • String Literals
    • In Oracle SQL, string literals refer to any text values enclosed in single quotes (e.g., 'Hello', 'John Doe'). These are case-sensitive. So, for example, 'hello' and 'Hello' are treated differently.
    • Datetime Strings: These are special kinds of strings that represent time-related values. Unlike other character strings, they can be added or subtracted to each other. To be recognized as a datetime value, a string must be in a specific formats defined by nls_date_format (e.g., 'YYYY-MM-DD', 'YYYY-MM-DD HH:MM:SS')
  • NULL
    • NULL is a special literal in Oracle SQL, representing unknown values. It differs from numeric zero or an empty string and requires the keywords IS and IS NOT for comparison.
    • Example: SELECT * FROM hr.employees WHERE commission_pct IS NULL;
Literals are used to represent constant values in SQL queries, and they are usually found in conditions, expressions, or as part of the data being inserted or updated in a database.

Identifiers

An identifier is the name assigned to a database object and must adhere to the following naming rules:

  • Length: Names must be between 1 and 30 characters long.
  • First Character: The first character must be a letter.
  • Subsequent Characters: Allowed characters include letters, numbers, $, _, and #. No other special characters may be used.
  • Case Sensitivity: Names are not case-sensitive.
  • Reserved Words: Reserved words cannot be used as identifiers.

To override the rules described above, you can use double quotations. Quoted names differ from non-quoted names in several ways: they can include any characters, have to be double quoted for retrieval, and are case-sensitive. However, Oracle strongly discourages overriding the naming rules, as it may lead to compatibility issues; not all database tools and applications support quoted identifiers, not to mention increased risk of typos.

Operators

Operators are special symbols that perform specific actions on the elements on their left and right sides (called operands), and produce a result. They're essential for expressions, which filter data, perform calculations, and combine conditions. SQL operators are grouped into several types based on their functionalities:

  • Arithmetic Operators
    • Used to perform mathematical operations on numeric values.
    • Examples: + (addition), - (subtraction), * (multiplication), / (division), % (modulo).
  • Comparison Operators
    • Used to compare two values and return true, false, or null.
    • Examples: = (equal to), != (not equal to), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to).
  • Logical Operators
    • Combine multiple conditions, often placed in WHERE clauses, to form complex queries.
    • Examples: AND (both conditions are true), OR (at least one condition is true), NOT (reverse the result of a condition).
  • Set Operators
    • Used to combine the result sets of two or more SELECT statements.
    • Examples: UNION (combines results from two queries and removes duplicates), UNION ALL (combines results from two queries and retains duplicates), INTERSECT (returns only rows found in both queries), MINUS (returns only rows from the first query not in the second).
  • Other Common Operators
    • IN: Checks if a values exists in a list of values.
    • BETWEEN a AND b: Checks if a value falls within a range given by a and b (inclusive).
    • LIKE: Checks if a string matches a specified pattern using wild cards (% for multiple characters, _ for a single character).
    • IS NULL: Checks if a value is NULL.

Reserved Words

Reserved words are predefined keywords that have specific meanings and functionalities within the SQL language. These words are part of the syntax and are used to perform various operations, such as defining data structures, controlling transactions, and managing database access. Because they serve these specific roles, reserved words cannot be used as identifiers (e.g., table names, column names, or users) unless they are enclosed in double quotes.

In Oracle, the V$RESERVED_WORDS view contains the full list of the reserved words for the current version. To check the reserved words, you can query it as follows:

Adding Comments

Comments are lines of code that are ignored by the computer but are intended for human readers. They serves as explanatory notes that provide clear explanation about program's logic without affecting the execution. Adding comments plays a key role in database programming, particularly when you are working in a team environment. 

In Oracle SQL, there are two ways to add comments. One method is to use two consecutive dashes (--) at the beginning of the text; everything placed after the dashes will be treated as a comment and ignored by the SQL engine. 

The other method is to place your comments between /* and */, which are opening and closing delimiters for a comment block. Comment blocks can span multiple lines and everything between the two delimiters is considered a comment. Here's an example demonstrating both:



[1] In this example, observe that a dummy table DUAL is used as a placeholder. This table is specific to Oracle's SQL dialect and fulfills the syntax requirement of including a table name in the FROM clause. This table always returns a 1x1 table with the calculation results and is commonly used for testing SQL syntax and functions.  
[2] Note that these keywords are not really statements or commands by themselves, but become one when combined with other reserved keywords. For example, CREATE TABLE or CREATE SEQUENCE is an SQL statement. In practice, however, many professionals call them CREATE statement. It is also worth noting that the term command and statement are used interchangeably by Oracle's official documentation. Either is fine!  
[3] With exceptions for SESSION and SYSTEM. ALTER SESSION changes settings for the current session, lasting only until the connection ends. ALTER SYSTEM modifies system-wide parameters, effective until the database restarts. Technically, neither ALTER SESSION nor ALTER SYSTEM is a DDL statement; they fall under a different category.  

Post a Comment

0 Comments