<sub>2025-05-15</sub> <sub>#data-visualization #data-management #sql #hmp669</sub> <sup>[[maps-of-content|🌐 Maps of Content — All Notes]] </sup> <sup>Series: [[hmp669|Data Management and Visualization]]</sup> ## SQL: The Language of Database Communication > [!abstract]- Quick Review > > Structured Query Language (SQL), the standard language for interacting with relational databases. SQL is a non-procedural language that allows users to query and manipulate relational databases by specifying what data they want rather than how to retrieve it. SELECT queries with various clauses function as building blocks that can be combined to create powerful data retrieval operations. > > **Key Concepts**: > > - SELECT queries retrieve data using various clauses (FROM, WHERE, GROUP BY, HAVING, ORDER BY) > - Aggregate functions (COUNT, MIN, MAX, SUM, AVG) summarize collections of rows > - JOINs combine data from multiple related tables (INNER, OUTER, CROSS) > > **Must Remember**: > > - WHERE filters rows before grouping; HAVING filters groups after aggregation > - The execution order of clauses matters (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY) > - SQL is non-procedural: you specify what you want, not how to get it > > **Critical Relationships**: > > - SELECT queries are built by combining clauses that act as building blocks > - Tables are connected through JOINs using matching key values > - Aggregation reduces multiple rows to summary values or grouped results > [!code]- Code Reference > > |Command/Syntax|Purpose|Example| > |---|---|---| > |**Basic SELECT**||| > |`SELECT column FROM table;`|Retrieve data from table|`SELECT name FROM employees;`| > |`SELECT * FROM table;`|Retrieve all columns|`SELECT * FROM products;`| > |`SELECT column AS alias`|Create column alias|`SELECT first_name AS FirstName FROM employees;`| > |`SELECT expr AS computed`|Create computed column|`SELECT price * quantity AS total FROM orders;`| > |**Filtering and Sorting**||| > |`SELECT DISTINCT column`|Remove duplicate values|`SELECT DISTINCT city FROM customers;`| > |`WHERE condition`|Filter rows|`WHERE price > 50`| > |`column = value`|Equality comparison|`WHERE status = 'Active'`| > |`column <> value`|Inequality comparison|`WHERE status <> 'Inactive'`| > |`column != value`|Inequality comparison (alternative)|`WHERE status != 'Inactive'`| > |`column > value`|Greater than|`WHERE price > 100`| > |`column >= value`|Greater than or equal|`WHERE age >= 18`| > |`column < value`|Less than|`WHERE price < 50`| > |`column <= value`|Less than or equal|`WHERE quantity <= 0`| > |`condition1 AND condition2`|Logical AND|`WHERE price > 50 AND category_id = 3`| > |`condition1 OR condition2`|Logical OR|`WHERE status = 'New' OR status = 'Pending'`| > |`NOT condition`|Logical NOT|`WHERE NOT discontinued = 1`| > |`ORDER BY column [ASC\|DESC]`|Sort results|`ORDER BY last_name, first_name DESC`| > |`IS NULL`|Check for NULL values|`WHERE order_id IS NULL`| > |`LIMIT n`|Limit result set to n rows|`LIMIT 5`| > |**Aggregation**||| > |`COUNT(column)`|Count non-null values|`SELECT COUNT(customer_id) FROM customers`| > |`COUNT(*)`|Count all rows|`SELECT COUNT(*) FROM orders`| > |`SUM(column)`|Sum values|`SELECT SUM(amount) FROM orders`| > |`AVG(column)`|Average value|`SELECT AVG(price) FROM products`| > |`MIN(column)`|Minimum value|`SELECT MIN(date) FROM orders`| > |`MAX(column)`|Maximum value|`SELECT MAX(price) FROM products`| > |`GROUP BY column1, column2`|Group rows for aggregation|`GROUP BY category_id, supplier_id`| > |`HAVING condition`|Filter groups|`HAVING COUNT(*) > 5`| > |**Joins**||| > |`INNER JOIN table2 ON condition`|Match rows in both tables|`INNER JOIN orders ON customers.id = orders.customer_id`| > |`LEFT JOIN table2 ON condition`|All rows from left table, matching from right|`LEFT JOIN orders ON customers.id = orders.customer_id`| > |`RIGHT JOIN table2 ON condition`|All rows from right table, matching from left|`RIGHT JOIN customers ON orders.customer_id = customers.id`| > |`CROSS JOIN table2`|Cartesian product of tables|`CROSS JOIN products`| > |`JOIN table2 ON table1.col = table2.col`|Self-join (same table)|`JOIN employees mgr ON emp.manager_id = mgr.employee_id`| > |**Data Definition Language (DDL)**||| > |`CREATE TABLE`|Create a new table|`CREATE TABLE employees (id INT, name VARCHAR(100))`| > |`ALTER TABLE`|Modify table structure|`ALTER TABLE employees ADD COLUMN hire_date DATE`| > |`DROP TABLE`|Remove a table|`DROP TABLE employees`| > |**Data Control Language (DCL)**||| > |`GRANT`|Give privileges to users|`GRANT SELECT ON employees TO user1`| > |`REVOKE`|Remove privileges|`REVOKE INSERT ON employees FROM user1`| > |**Transaction Control Language (TCL)**||| > |`COMMIT`|Save transaction changes|`COMMIT;`| > |`ROLLBACK`|Undo transaction changes|`ROLLBACK;`| > |**Arithmetic Operators**||| > |`+`|Addition|`SELECT price + tax AS total`| > |`-`|Subtraction|`SELECT price - discount AS net_price`| > |`*`|Multiplication|`SELECT quantity * price AS line_total`| > |`/`|Division|`SELECT total / quantity AS unit_price`| > |**Semicolon (;)**|Terminates SQL statement|`SELECT * FROM employees;`| > >[!success]- SELECT Query Concept Sketch >![[select-query.excalidraw.svg]] ## SQL Fundamentals ### What Makes SQL Special SQL stands out among programming languages for two key reasons: 1. **Ease of Learning**: "SQL is relatively easy to learn. Its basic command set has a vocabulary of fewer than 100 words." 2. **Non-Procedural Nature**: "SQL is a non-procedural language. In layperson's terms, you merely command what is to be done. You do not have to worry about how." > [!note] Language Classification SQL functions as multiple types of language: > > - **Data Definition Language** (DDL): Creating database objects > - **Data Manipulation Language** (DML): Inserting, updating, deleting, and retrieving data > - **Transaction Control Language** (TCL): Managing database transactions > - **Data Control Language** (DCL): Controlling access to data ![[sql-basics-1747376704559.webp]] ### The Building Block Approach to SQL SQL queries are constructed like building blocks, with each clause serving a specific purpose that contributes to the whole: ```mermaid flowchart TD A[SELECT] --> B[FROM / JOIN] B --> C[WHERE] C --> D[GROUP BY] D --> E[HAVING] E --> F[ORDER BY] style A fill:#f9d5e5,stroke:#333 style B fill:#eeac99,stroke:#333 style C fill:#e06377,stroke:#333 style D fill:#c83349,stroke:#333 style E fill:#5b9aa0,stroke:#333 style F fill:#d6e1c7,stroke:#333 ``` > [!tip] Unlike procedural programming where you specify steps, in SQL you declare what result you want. The database engine determines the most efficient way to execute your request. ### SQL Commands > [!NOTE]- SQL Commands > > | COMMAND, OPTION, OR OPERATOR | DESCRIPTION | > | ---------------------------- | ----------------------------------------------------------------- | > | SELECT | Selects attributes from rows in one or more tables or views | > | FROM | Specifies the tables from which data should be retrieved | > | WHERE | Restricts the selection of rows based on a conditional expression | > | GROUP BY | Groups the selected rows based on one or more attributes | > | HAVING | Restricts the selection of grouped rows based on a condition | > | ORDER BY | Orders the selected rows based on one or more attributes | > | INSERT | Inserts row(s) into a table | > | UPDATE | Modifies an attribute’s values in one or more table’s rows | > | DELETE | Deletes one or more rows from a table | > ### Comparison Operators > > | Operator | Description | > |----------------------------|------------------------------------| > | =, <, >, <=, >=, <>, != | Used in conditional expressions | > > ### Logical Operators > > | Operator | Description | > |--------------|------------------------------------| > | AND/OR/NOT | Used in conditional expressions | > > ### Special Operators > > | Operator | Description | > |-------------|-----------------------------------------------------------------------------| > | BETWEEN | Checks whether an attribute value is within a range | > | IN | Checks whether an attribute value matches any value within a value list | > | LIKE | Checks whether an attribute value matches a given string pattern | > | IS NULL | Checks whether an attribute value is null | > | EXISTS | Checks whether a subquery returns any rows | > | DISTINCT | Limits values to unique values | > > ### Aggregate Functions > _Used with `SELECT` to return mathematical summaries on columns_ > > > > | Function | Description | > | -------- | ------------------------------------------------------------------ | > | COUNT | Returns the number of rows with non-null values for a given column | > | MIN | Returns the minimum attribute value found in a given column | > | MAX | Returns the maximum attribute value found in a given column | > | SUM | Returns the sum of all values for a given column | > | AVG | Returns the average of all values for a given column | > > ### SQL Data Definition Commands > > > > | COMMAND OR OPTION | DESCRIPTION | > | --------------------------- | ------------------------------------------------------------------------------------ | > | CREATE SCHEMA AUTHORIZATION | Creates a database schema | > | CREATE TABLE | Creates a new table in the user’s database schema | > | NOT NULL | Ensures that a column will not have null values | > | UNIQUE | Ensures that a column will not have duplicate values | > | PRIMARY KEY | Defines a primary key for a table | > | FOREIGN KEY | Defines a foreign key for a table | > | DEFAULT | Defines a default value for a column (when no value is given) | > | CHECK | Validates data in an attribute | > | CREATE INDEX | Creates an index for a table | > | CREATE VIEW | Creates a dynamic subset of rows and columns from one or more tables | > | ALTER TABLE | Modifies a table’s definition (adds, modifies, or deletes attributes or constraints) | > | CREATE TABLE AS | Creates a new table based on a query in the user’s database schema | > | DROP TABLE | Permanently deletes a table (and its data) | > | DROP INDEX | Permanently deletes an index | > | DROP VIEW | Permanently deletes a view | > > ### Transaction Control Language > > > > | COMMAND | DESCRIPTION | > |-----------|------------------------------------------| > | COMMIT | Permanently saves data changes | > | ROLLBACK | Restores data to its original values | > > ### Data Control Language > > > > | COMMAND | DESCRIPTION | > | ------- | ----------------------------------------------------------------------- | > | GRANT | Gives a user permission to take a system action or access a data object | > | REVOKE | Removes a previously granted permission from a user | > ## The SELECT Query: Data Retrieval Foundation ### Basic SELECT Structure The most fundamental SQL operation is retrieving data with SELECT: ```sql SELECT columnlist FROM tablelist; ``` - **columnlist**: One or more column names, separated by commas - **tablelist**: One or more table names from which to retrieve data - **semicolon (;)**: Terminates the statement (optional in some systems but recommended) > [!note] The order of columns in your SELECT statement determines the order they appear in results. To retrieve all columns from a table, use the asterisk `*` wildcard: ```sql SELECT * FROM Customers; ``` > [!warning] While convenient for exploration, using SELECT * in production code is generally discouraged as it retrieves unnecessary data and creates maintenance issues if table structures change. #### Practical Examples **Example 1: Basic column selection** ```sql SELECT PAT_NAME, PAT_AGE FROM PATIENTS; ``` **Example 2: Using column aliases for better readability** ```sql SELECT PAT_NAME AS [Patient Name], PAT_AGE AS [Patient Age] FROM PATIENTS; ``` ### Customizing Column Output #### Column Aliases with AS The AS clause creates alternative names for columns in the result set: ```sql SELECT first_name AS FirstName, last_name AS LastName FROM employees; ``` #### Computed Columns Create calculated values derived from existing columns: ```sql SELECT product_name, unit_price, units_in_stock, unit_price * units_in_stock AS inventory_value FROM products; ``` > [!important] Derived Attributes > "A computed column also called a calculated column represents a derived attribute... the attribute must be calculated when it's needed." #### Arithmetic Operators SQL supports standard mathematical operations: - Addition (`+`) - Subtraction (`-`) - Multiplication (`*`) - Division (`/`) - Exponentiation (varies by DBMS) Order of operations follows mathematical precedence: 1. Parentheses first 2. Exponentiation (^) 3. Multiplication (\*) and division (/) 4. Addition (+) and subtraction (-) ## Filtering and Sorting Results ### Filtering Rows with WHERE The WHERE clause filters rows based on specified conditions: ```sql SELECT column1, column2 FROM table_name WHERE condition; ``` #### Comparison Operators | Operator | Description | Example | | -------- | ------------------------ | -------------------------- | | = | Equal to | WHERE Age = 25 | | < | Less than | WHERE Price < 100 | | <= | Less than or equal to | WHERE Quantity <= 0 | | > | Greater than | WHERE Salary > 50000 | | >= | Greater than or equal to | WHERE Height >= 72 | | != or <> | Not equal to | WHERE Status != 'Inactive' | #### Logical Operators - AND: Both conditions must be true - OR: At least one condition must be true - NOT: Negates a condition ```sql SELECT product_name, unit_price FROM products WHERE unit_price > 50 AND category_id = 1; ``` **Practical Example: Filtering patients by age** ```sql SELECT PAT_NAME, PAT_AGE FROM PATIENTS WHERE PAT_AGE < 55; ``` > [!warning] Empty Results > If no rows match your WHERE conditions, you'll receive an empty result set - not an error. ### Removing Duplicates with DISTINCT The DISTINCT keyword eliminates duplicate values from your results, showing only unique values. ```sql SELECT DISTINCT city FROM customers; ``` > [!note] DISTINCT appears immediately after the SELECT keyword and applies to the entire row of selected columns, not just a single column. ### Sorting Results with ORDER BY The ORDER BY clause sorts query results: ```sql SELECT column1, column2 FROM table_name ORDER BY column1 [ASC|DESC], column2 [ASC|DESC]; ``` - Default sort order is ascending (ASC) - Use DESC for descending order - Multiple columns create a cascading order sequence ![[sql-basics-1747377452987.webp]] **Practical Example: Complex sorting** ```sql SELECT PAT_ZIPCODE, PAT_COUNTY, AVG(PAT_AGE) AS Average_Age FROM PATIENTS GROUP BY PAT_ZIPCODE, PAT_COUNTY HAVING PAT_COUNTY = 'Washtenaw' ORDER BY PAT_ZIPCODE ASC, AVG(PAT_AGE) DESC; ``` This sorts first by zipcode in ascending order, then by average age in descending order within each zipcode. ## Aggregating and Grouping Data ### Aggregate Functions Aggregation functions perform calculations across a set of rows and return a single value. |Function|Description|Example| |---|---|---| |COUNT()|Counts rows|`COUNT(customer_id)`| |SUM()|Adds values|`SUM(order_amount)`| |AVG()|Calculates average|`AVG(unit_price)`| |MIN()|Finds minimum value|`MIN(date_hired)`| |MAX()|Finds maximum value|`MAX(salary)`| ```sql SELECT COUNT(order_id), SUM(order_total), AVG(order_total) FROM orders WHERE order_date > '2023-01-01'; ``` **Practical Example: Average patient age** ```sql SELECT AVG(PAT_AGE) AS [Average_Age] FROM PATIENTS; ``` ### Grouping Data with GROUP BY The GROUP BY clause organizes rows into groups for aggregation: ```sql SELECT column1, COUNT(*) FROM table_name GROUP BY column1; ``` This creates groups of rows with the same values in the specified column(s), then applies aggregate functions to each group. #### Handling NULL Values When grouping, rows with NULL values in the GROUP BY column(s) are collected into their own separate group. This can be important to remember when analyzing results. #### Grouping by Multiple Columns You can create more specific groups by including multiple columns: ```sql SELECT department, category, COUNT(product_id) FROM products GROUP BY department, category; ``` This creates separate groups for each unique combination of department and category values. **Practical Examples: Grouping patient data** **Example 1: Average age by gender** ```sql SELECT PAT_GENDER AS [Patient Gender], AVG(PAT_AGE) AS [Average_Age] FROM PATIENTS GROUP BY PAT_GENDER; ``` **Example 2: Average age by zipcode and county** ```sql SELECT PAT_ZIPCODE, PAT_COUNTY, AVG(PAT_AGE) AS Average_Age FROM PATIENTS GROUP BY PAT_ZIPCODE, PAT_COUNTY; ``` ### Filtering Groups with HAVING The HAVING clause filters groups based on aggregate results: ```sql SELECT column1, COUNT(*) as count FROM table_name GROUP BY column1 HAVING COUNT(*) > 5; ``` > [!warning] Common Misconception > One of the most common SQL mistakes is trying to use WHERE to filter based on aggregate values, which doesn't work because WHERE processes individual rows before any aggregation occurs. > [!warning] WHERE vs. HAVING > > - WHERE filters individual rows **before** they are grouped > - HAVING filters groups **after** the GROUP BY clause > - HAVING can use aggregate functions; WHERE cannot SQL provides the HAVING clause specifically for filtering groups based on aggregate results: |Clause|Filters|Timing|Can use aggregates?| |---|---|---|---| |WHERE|Individual rows|Before grouping|No| |HAVING|Groups of rows|After grouping|Yes| ```mermaid flowchart TD A[All Rows] --> B[WHERE] B[WHERE] --> C[Filtered Rows] C --> D[GROUP BY] D --> E[Grouped Results] E --> F[HAVING] F --> G[Filtered Groups] style B fill:#f9d5e5,stroke:#333 style D fill:#eeac99,stroke:#333 style F fill:#e06377,stroke:#333 ``` ```sql SELECT category, COUNT(product_id) as product_count FROM products GROUP BY category HAVING COUNT(product_id) > 10; ``` This returns only categories that have more than 10 products. **Practical Example: Filtering groups by county** **Method 1: Using WHERE (filters before grouping)** ```sql SELECT PAT_ZIPCODE, PAT_COUNTY, AVG(PAT_AGE) AS Average_Age FROM PATIENTS WHERE PAT_COUNTY = 'Washtenaw' GROUP BY PAT_ZIPCODE, PAT_COUNTY; ``` **Method 2: Using HAVING (filters after grouping)** ```sql SELECT PAT_ZIP AS [Zipcode], PAT_COUNTY AS [County], AVG(PAT_AGE) AS [Average Age] FROM PATIENTS GROUP BY PAT_ZIP, PAT_COUNTY HAVING PAT_COUNTY = "Washtenaw"; ``` > [!note] Both approaches filter to Washtenaw county, but WHERE is more efficient as it filters before ## Joining Multiple Tables ### Understanding Table Relationships Relational databases store data across multiple related tables. Joins combine this data based on relationships between tables. - Usually connects a **foreign key** in one table to a **primary key** in another - The join condition defines which rows should be matched (typically equality) - Different join types determine how unmatched rows are handled ```mermaid erDiagram CUSTOMERS ||--o{ ORDERS : places CUSTOMERS { int customer_id string name string address } ORDERS { int order_id int customer_id date order_date } ``` ### Types of JOINs #### INNER JOIN Returns only rows where matching values exist in both tables: ```sql SELECT customers.name, orders.order_date FROM customers INNER JOIN orders ON customers.customer_id = orders.customer_id; ``` ```mermaid graph LR subgraph "INNER JOIN" A((Table A)) --- C(("A ⋂ B")) B((Table B)) --- C end ``` **Practical Example: Joining visits with patient data** ```sql SELECT VISITS.V_ID AS [Visit ID], VISITS.V_DATE AS [Visit Date], PATIENTS.PAT_NAME AS [Patient Name], PATIENTS.PAT_AGE AS [Patient Age] FROM VISITS INNER JOIN PATIENTS ON VISITS.V_PAT = PATIENTS.PAT_ID; ``` **Alternative with table aliases** ```sql SELECT V.V_ID AS [Visit ID], V.V_DATE AS [Visit Date], PA.PAT_NAME AS [Patient Name], PA.PAT_AGE AS [Patient Age] FROM VISITS AS V INNER JOIN PATIENTS AS PA ON V.V_PAT = PA.PAT_ID; ``` #### OUTER JOIN Returns matching rows plus unmatched rows from one or both tables: - **LEFT JOIN**: All rows from the left table, plus matching rows from the right. Missing matches filled with NULL. - **RIGHT JOIN**: All rows from the right table, plus matching rows from the left. Missing matches filled with NULL. ```sql SELECT customers.name, orders.order_date FROM customers LEFT JOIN orders ON customers.customer_id = orders.customer_id; ``` ```mermaid graph LR subgraph "LEFT JOIN" A((Table A)) --- C(("A ⋂ B + remaining A")) B((Table B)) --- C end ``` ```mermaid graph LR subgraph "RIGHT JOIN" A((Table A)) --- C(("A ⋂ B + remaining B")) B((Table B)) --- C end ``` **Practical Example: All patients with their visit dates (including patients with no visits)** ```sql SELECT PA.PAT_NAME AS [Patient Name], PA.PAT_AGE AS [Patient Age], V.V_ID AS [Visit ID], V.V_DATE AS [Visit Date] FROM PATIENTS AS PA LEFT JOIN VISITS AS V ON V.V_PAT = PA.PAT_ID; ``` **Alternative using RIGHT JOIN** ```sql SELECT PA.PAT_NAME AS [Patient Name], PA.PAT_AGE AS [Patient Age], V.V_ID AS [Visit ID], V.V_DATE AS [Visit Date] FROM VISITS AS V RIGHT JOIN PATIENTS AS PA ON V.V_PAT = PA.PAT_ID; ``` #### CROSS JOIN Creates a Cartesian product - all possible combinations of rows: ```sql SELECT employees.name, shifts.time FROM employees CROSS JOIN shifts; ``` ```mermaid graph LR subgraph "CROSS JOIN" A((Table A)) --- C(("A × B")) B((Table B)) --- C end ``` > [!warning] Cross joins can produce very large result sets and are rarely needed. They're often the result of forgetting to specify a join condition. ### Recursive Joins (Self Joins) Joins a table to itself ```sql SELECT e.name AS employee, m.name AS manager FROM employees AS e INNER JOIN employees AS m ON e.manager_id = m.employee_id; ``` This is especially useful for organizational charts, product categories, or any hierarchical relationships. > [!tip] Self-joins always require table aliases to distinguish between the different "instances" of the same table. ### Complex JOIN Example: Multiple Table Join with Filtering **Example: Patient demographics with county information** ```sql SELECT PA.PAT_ZIP AS [ZipCode], C.C_NAME AS [County_Name], AVG(PA.PAT_AGE) AS [Average_Age] FROM PATIENTS AS PA INNER JOIN COUNTY AS C ON PA.PAT_COUNTY = C.C_ID WHERE C.C_NAME = 'Washtenaw' GROUP BY PA.PAT_ZIP, C.C_NAME; ``` **Alternative with multiple WHERE conditions** ```sql SELECT PA.PAT_ZIP AS [ZipCode], C.C_NAME AS [County_Name], AVG(PA.PAT_AGE) AS [Average_Age] FROM PATIENTS AS PA INNER JOIN COUNTY AS C ON PA.PAT_COUNTY = C.C_ID WHERE C.C_NAME IN ['Washtenaw', 'Ypsilanti'] -- OR WHERE C.C_NAME = 'Washtenaw' OR C.C_NAME = 'Ypsilanti' GROUP BY PA.PAT_ZIP, C.C_NAME; ``` ## Case Application: Online Retail Analysis > [!example]- E-commerce Data Analysis > > **Business Scenario**: An online retailer wants to analyze customer orders to identify top-spending customers and best-selling products by category. > > **Database Structure**: > - Customers (customer_id, name, email, city, state) > - Products (product_id, name, category_id, unit_price) > - Categories (category_id, category_name) > - Orders (order_id, customer_id, order_date) > - OrderDetails (order_id, product_id, quantity, price) > > **Analysis 1: Top 5 Customers by Total Spending** > ```sql > SELECT c.name, c.city, c.state, > SUM(od.price * od.quantity) AS total_spent > FROM Customers c > INNER JOIN Orders o ON c.customer_id = o.customer_id > INNER JOIN OrderDetails od ON o.order_id = od.order_id > GROUP BY c.customer_id, c.name, c.city, c.state > ORDER BY total_spent DESC > LIMIT 5; > ``` > > **Analysis 2: Best-Selling Products by Category** > ```sql > SELECT cat.category_name, > p.name AS product_name, > SUM(od.quantity) AS units_sold, > SUM(od.price * od.quantity) AS revenue > FROM Categories cat > INNER JOIN Products p ON cat.category_id = p.category_id > INNER JOIN OrderDetails od ON p.product_id = od.product_id > GROUP BY cat.category_id, cat.category_name, p.product_id, p.name > ORDER BY cat.category_name, units_sold DESC; > ``` > > **Analysis 3: Customers with No Orders (Using LEFT JOIN)** > ```sql > SELECT c.name, c.email > FROM Customers c > LEFT JOIN Orders o ON c.customer_id = o.customer_id > WHERE o.order_id IS NULL > ORDER BY c.name; > ``` > > These queries demonstrate how to: > 1. Join multiple tables to connect related information > 2. Use aggregate functions to summarize data > 3. Group results to organize information > 4. Apply sorting to present data meaningfully > 5. Use different join types to answer different business questions ## SQL Query Execution Order Understanding the logical order in which SQL processes clauses helps in writing effective queries: 1. **FROM**: Tables are identified and joined 2. **WHERE**: Rows are filtered 3. **GROUP BY**: Rows are grouped 4. **HAVING**: Groups are filtered 5. **SELECT**: Columns are selected and expressions evaluated 6. **ORDER BY**: Results are sorted > [!important] Mental Model > This execution order explains why column aliases defined in the SELECT clause can be used in ORDER BY but not in WHERE or GROUP BY clauses. ```mermaid flowchart TD A[FROM] --> B[WHERE] B --> C[GROUP BY] C --> D[HAVING] D --> E[SELECT] E --> F[ORDER BY] style A fill:#f9d5e5,stroke:#333 style B fill:#eeac99,stroke:#333 style C fill:#e06377,stroke:#333 style D fill:#c83349,stroke:#333 style E fill:#5b9aa0,stroke:#333 style F fill:#d6e1c7,stroke:#333 ```` ## Summary: SQL Queries SQL provides a powerful, declarative approach to data retrieval and manipulation. Starting with basic SELECT statements, you can progressively add clauses to filter, sort, group, and join data to answer complex business questions. The key concepts to understand are: - SELECT queries and their customization with clauses - Filtering with WHERE versus HAVING - Aggregating data with functions and GROUP BY - Combining related data using different JOIN types ### The Single Most Important Takeaway SQL's power comes from its building-block approach - relatively simple clauses that can be combined in countless ways to answer virtually any data question. -- Reference: