<sub>2025-04-09 Wednesday</sub> <sub>#sas-programming </sub>
<sup>[[maps-of-content]] </sup>
> [!success]- Concept Sketch: [[]]
> ![[]]
> [!abstract]- Quick Review
>
> **Core Essence**: SAS accesses data through libraries that connect to various data sources, allowing users to work with both structured data (like SAS tables) and imported unstructured data after conversion.
>
> **Key Concepts**:
>
> - Structured data has defined columns with attributes; unstructured data requires importing
> - SAS tables consist of a descriptor portion (metadata) and data portion (values)
> - Libraries provide an efficient way to access data by location and type
>
> **Must Remember**:
>
> - SAS column names must start with a letter/underscore, can be up to 32 characters
> - Every column requires three attributes: name, type (character/numeric), and length
> - The WORK library is temporary; data is deleted when session ends
>
> **Critical Relationships**:
>
> - Libraries connect SAS to data sources using LIBNAME statements
> - PROC IMPORT creates static copies while library engines provide direct access
> - SAS tables (structured) allow for analysis while unstructured data must be converted first
> [!code]- Code Reference
>
>
> |Command/Syntax|Purpose|Example|
> |---|---|---|
> |**Data Exploration**|||
> |`PROC CONTENTS`|View table metadata|`PROC CONTENTS DATA=sashelp.class; RUN;`|
> |**Library Management**|||
> |`LIBNAME`|Create library reference|`LIBNAME mylib "C:/data";`|
> |`LIBNAME libref CLEAR`|Remove library reference|`LIBNAME myexcel CLEAR;`|
> |`LIBNAME libref XLSX`|Connect to Excel workbook|`LIBNAME myexcel XLSX "C:/data/file.xlsx";`|
> |`OPTIONS VALIDVARNAME=V7`|Enforce SAS naming rules|`OPTIONS VALIDVARNAME=V7;`|
> |**Data Import**|||
> |`PROC IMPORT DATAFILE=`|Import external file|`PROC IMPORT DATAFILE="file.csv" DBMS=CSV OUT=output;`|
> |`DBMS=`|Specify file type|`DBMS=CSV` or `DBMS=XLSX`|
> |`OUT=`|Specify output table|`OUT=work.mydata`|
> |`REPLACE`|Overwrite existing table|`REPLACE;`|
> |`GUESSINGROWS=`|Rows to examine for attributes|`GUESSINGROWS=MAX;`|
> |`SHEET=`|Excel worksheet to import|`SHEET="Sales";`|
# Accessing Data with SAS: Visual Note Guide
## Introduction to SAS Data Access
SAS provides robust methods for working with data from various sources. This guide explores how SAS handles different data types, structures data in tables, provides access through libraries, and imports external data. Understanding these fundamental concepts is essential for effective data analysis in SAS.
> [!note] Learning Objectives After studying this guide, you should understand:
>
> - The difference between structured and unstructured data
> - The composition and attributes of SAS tables
> - How to use SAS libraries to access data
> - Methods for importing external data into SAS
## 1. Understanding Data Types in SAS
### Structured vs. Unstructured Data
**Structured data** has defined columns with attributes that help SAS understand how to read and display values. Examples include:
- SAS tables
- Database management system (DBMS) tables
- Hadoop
**Unstructured data** lacks defined columns and appears to a computer as a continuous string of text. Examples include:
- Text files
- CSV (comma-separated values) files
- JSON files
- Weblogs
> [!tip] Data Access Path
> Structured data can be accessed directly through SAS libraries, while unstructured data must first be imported and converted into a structured SAS table before analysis.
## 2. The Structure of SAS Tables
A SAS table (also called a data set) is a structured file with the extension `.sas7bdat` that contains:
### Components of a SAS Table
1. the descriptor portion (metadata)
- Table name
- Number of rows
- Creation date/time
- Column names and attributes
2. **Data portion** - Contains the actual data values organized in:
- Columns (variables)
- Rows (observations)
### SAS Terminology
|Traditional Term|SAS Term|
|---|---|
|Table|Data set|
|Column|Variable|
|Row|Observation|
### Required Column Attributes
Every column in a SAS table must have these three attributes:
1. **Name**
- 1-32 characters long
- Must start with a letter or underscore
- Can contain letters, numbers, or underscores
- Case is preserved upon creation but is case-insensitive for referencing
2. **Type**
- **Numeric**: Stores only numeric values (0-9, -, ., E)
- SAS dates are stored as numeric values (days from January 1, 1960)
- Default length is 8 bytes
- **Character**: Stores letters, numbers, special characters, and blanks
- Length can range from 1 to 32,767 bytes (1 byte per character)
3. **Length**
- Number of bytes allocated to store column values
- Numeric columns default to 8 bytes
- Character columns can be 1 to 32,767 bytes
### Viewing Table and Column Attributes
The PROC CONTENTS procedure displays information about a SAS table:
```sas
PROC CONTENTS DATA=table_name;
RUN;
```
This report shows both general table information and an alphabetic list of variables with their attributes.
## 3. Accessing Data Through SAS Libraries
### What is a SAS Library?
A **SAS library** is a collection of SAS files (including tables) in the same location. Libraries provide an efficient way to specify the location and file type of data you want to access.
### Creating Libraries with LIBNAME
The LIBNAME statement creates a library:
```sas
LIBNAME libref engine "path";
```
Where:
- **libref**: Library reference name (max 8 characters, starts with letter/underscore)
- **engine**: Specifies the type of data (e.g., Base for SAS tables, XLSX for Excel)
- **path**: Physical location of the data (directory path or database connection)
Once defined, you access tables using the `libref.tablename` format.
> [!warning] Library Duration
> Libraries exist only for the duration of the SAS session or until explicitly deleted with the LIBNAME CLEAR statement. The underlying data is not affected when a library is cleared.
### Automatic SAS Libraries
SAS automatically defines several libraries at the start of each session:
- **Work**: Temporary library for working files (deleted at session end)
- Is the default library when no libref is specified
- **Sashelp**: Contains sample tables and other files
## 4. Importing Unstructured Data
Unstructured data must be imported into a structured SAS table before analysis.
### PROC IMPORT for Data Import
The PROC IMPORT procedure converts external files into SAS tables:
```sas
PROC IMPORT DATAFILE="path/filename"
DBMS=filetype
OUT=output_table
REPLACE;
GUESSINGROWS=n;
RUN;
```
Where:
- **DATAFILE=**: Path and filename of the external data
- **DBMS=**: Type of file being imported (CSV, XLSX, etc.)
- **OUT=**: Library and name of the SAS output table to create
- **REPLACE**: (Optional) Overwrites output table if it already exists
- **GUESSINGROWS=**: (Optional) Number of rows SAS examines to determine attributes
> [!note] Column Names Assumption
> SAS assumes column names are in the first row of the file by default.
### Common Import Types
#### Comma-Delimited (CSV) Files
```sas
PROC IMPORT DATAFILE="file.csv"
DBMS=CSV
OUT=output_table
REPLACE;
GUESSINGROWS=n;
RUN;
```
#### Excel (XLSX) Files
```sas
PROC IMPORT DATAFILE="file.xlsx"
DBMS=XLSX
OUT=output_table
REPLACE;
SHEET=sheet_name;
RUN;
```
> [!tip] GUESSINGROWS Option Increase the GUESSINGROWS value or set it to MAX if SAS incorrectly determines column attributes, especially with large files where the first 20 rows (default) aren't representative of all data.
## 5. Accessing Excel Data with SAS
SAS provides two distinct LIBNAME Engine
```sas
/* Set option to enforce SAS naming conventions */
OPTIONS VALIDVARNAME=V7;
/* Create library to Excel file */
LIBNAME myexcel XLSX "path/file.xlsx";
/* Access data */
PROC PRINT DATA=myexcel.worksheet_name;
RUN;
/* Close the connection */
LIBNAME myexcel CLEAR;
```
**Advantages**:
- Provides direct access to the most current data
- Works with Excel data in place without creating copies
- Allows reading and writing to Excel
### Method 2: PROC IMPORT
```sas
PROC IMPORT DATAFILE="path/file.xlsx"
DBMS=XLSX
OUT=output_table
REPLACE;
SHEET=sheet_name;
RUN;
```
**Advantages**:
- Creates a static copy of Excel data at import time
- Simple syntax for one-time imports
- Works without additional SAS/ACCESS licenses
> [!warning] SAS/ACCESS License Requirement
> Both methods for accessing Excel data require the SAS/ACCESS to PC Files license.
### Choosing Between LIBNAME and PROC IMPORT
- Use **LIBNAME** when you need:
- Real-time access to changing Excel data
- To both read and write to Excel
- To work with multiple worksheets in the same file
- Use **PROC IMPORT** when you need:
- A static copy of data at a specific point in time
- Simpler syntax for one-time imports
- To document exactly when data was extracted
> [!example]- Case Application: Data Analyst Workflow
>
> A financial analyst receives daily Excel files containing transaction data. Their workflow demonstrates both access methods:
>
> **Morning Report (static data needed):**
>```sas
> /* Import yesterday's final sales data */
> PROC IMPORT DATAFILE="C:/Reports/Sales_2023_10_25.xlsx"
> DBMS=XLSX
> OUT=work.yesterday_sales
> REPLACE;
> RUN;
>
> /* Create summary report */
> PROC MEANS DATA=work.yesterday_sales;
> VAR Amount;
> CLASS Region;
> RUN;
>
>```
> **Real-time Dashboard (constantly updating data):**
>```sas
> /* Set option for naming */
> OPTIONS VALIDVARNAME=V7;
>
> /* Connect to live sales tracking spreadsheet */
> LIBNAME live XLSX "C:/Dashboard/Live_Sales.xlsx";
>
> /* Create real-time report that refreshes with each run */
> PROC PRINT DATA=live.today_sales;
> WHERE Amount > 10000;
> RUN;
>
> /* Close connection */
> LIBNAME live CLEAR;
>
>```
> This workflow demonstrates how to select the appropriate method based on business needs.
## Summary: Efficient Data Access in SAS
SAS provides flexible methods for accessing various data types:
1. **Understanding data types** is the first step—structured data can be accessed directly, while unstructured data requires importing.
2. **SAS tables** form the foundation of analysis with their descriptor and data portions, requiring proper column attributes (name, type, length).
3. **Libraries** serve as connection points to data sources, with the WORK library providing temporary storage during your session.
4. **Importing data** with PROC IMPORT transforms unstructured data into structured SAS tables for analysis.
5. **Accessing Excel data** can be done either through direct connection (LIBNAME) or by creating static copies (PROC IMPORT).
> [!important] Key Takeaway
> The most important concept to remember is that SAS works most efficiently with structured data, and all data access methods serve to either connect directly to structured data or convert unstructured data into a structured format for analysis.
```mermaid
flowchart TD
A[Data Source] --> B{Structured?}
B -->|Yes| C[Direct Access via LIBNAME]
B -->|No| D[Import via PROC IMPORT]
C --> E[SAS Analysis]
D --> E
subgraph "SAS Environment"
C
D
E
end
```
--
Reference: