<sub>2025-06-01</sub> <sub>#data-science #sas-programming </sub>
<sup>[[maps-of-content|π Maps of Content β All Notes]] </sup>
<sup>Series: [[sas-programming-1-essentials|SAS Programming 1 β Essentials]]</sup>
<sup>Topic: [[sas-programming-1-essentials#Lesson 2 Accessing Data|Lesson 2: Accessing Data]]</sup>
# SAS Data Import
> [!abstract]- Overview
>
> PROC IMPORT transforms external, unstructured data (CSV, Excel) into structured SAS datasets you can analyze and manipulate.
>
> **Key Concepts**:
>
> - **Source Specification**: Tell SAS where your data lives (DATAFILE, DBMS)
> - **Destination Control**: Define your SAS output table (OUT, REPLACE)
> - **Smart Detection**: Let SAS guess data types or guide the process (GUESSINGROWS)
>
> **Critical Connections**: External files β PROC IMPORT β SAS datasets β Analysis ready
>
> **Must Remember**: PROC IMPORT creates a _snapshot_ of your external data, not a live connection
> [!code]- Syntax Reference
>
>
> |Command/Syntax|Purpose|Example|
> |---|---|---|
> |**Basic Structure**|||
> |`PROC IMPORT DATAFILE="path/file" DBMS=type OUT=table;`|Core import syntax|`PROC IMPORT DATAFILE="data.csv" DBMS=CSV OUT=mydata;`|
> |**Essential Options**|||
> |`DATAFILE="path"`|Specifies source file location|`DATAFILE="C:/data/sales.csv"`|
> |`DBMS=CSV`|Defines file type (CSV included with Base SAS)|`DBMS=CSV`|
> |`DBMS=XLSX`|Excel files (requires SAS/ACCESS license)|`DBMS=XLSX`|
> |`OUT=library.tablename`|Names output SAS dataset|`OUT=work.imported_data`|
> |**Control Options**|||
> |`REPLACE`|Overwrites existing output table|`OUT=mydata REPLACE;`|
> |`GUESSINGROWS=n`|Rows to scan for data types (default=20)|`GUESSINGROWS=100` or `GUESSINGROWS=MAX`|
> |`SHEET="name"`|Excel worksheet to import|`SHEET="Sales_Data";`|
## The Data Import Challenge
**Imagine this scenario**: You have a treasure trove of data sitting in Excel spreadsheets and CSV files, but SAS can only work its analytical magic on structured SAS datasets. It's like having a key that doesn't fit the lock---you need a translator.
**That translator is PROC IMPORT**---your bridge between the external data world and SAS's structured universe.
## Understanding the Core Workflow
The import process follows a simple but powerful pattern:
```plaintext
External File β PROC IMPORT β SAS Dataset β Ready for Analysis
β β β β
CSV/Excel Reads & Maps Structured Reports, Models,
Files Data Types SAS Table Visualizations
```
> [!tip] Think of PROC IMPORT as a Smart Translator
> It doesn't just copy data---it reads your external file, makes intelligent guesses about data types, converts formats (like dates), and creates a proper SAS dataset that follows all the rules SAS expects.
## Essential Building Blocks
### The Three Pillars of PROC IMPORT
Every PROC IMPORT statement needs these three foundation pieces:
**1. DATAFILE**: _Where is your data?_
```sas
DATAFILE="C:/Users/YourName/Documents/sales_data.csv"
```
**2. DBMS**: _What type of file is it?_
```sas
DBMS=CSV /* For comma-delimited files */
DBMS=XLSX /* For Excel files */
```
**3. OUT**: _What should we call the SAS table?_
```sas
OUT=work.sales_data /* Creates temporary table */
OUT=mylib.sales_data /* Creates permanent table */
```
### Smart Defaults That Work For You
SAS makes reasonable assumptions to get you started quickly:
- **Column names** come from the first row
- **Data types** determined by scanning the first 20 rows
- **Dates** automatically converted to SAS date values
- **Character data** with commas gets properly handled (when quoted)
## Working with CSV Files
CSV (Comma-Separated Values) files are the workhorses of data exchange. Here's your standard approach:
```sas
proc import datafile="storm_damage.csv"
dbms=csv
out=work.storm_data
replace;
run;
```
> [!note] Why REPLACE Matters
> Without REPLACE, SAS will throw an error if your output table already exists. Adding REPLACE lets you re-run your code without manual cleanup---perfect for iterative data work.
**Real Example Breakdown**:
```sas
proc import datafile="C:/data/storm_damage.csv"
dbms=csv
out=work.storm_damage_import
replace;
guessingrows=100; /* Look at more rows for better type detection */
run;
```
This creates a SAS dataset where:
- Numeric columns (like Cost, Date) become SAS numeric variables
- Text columns (like Event, Summary) become character variables
- Quoted values with commas are handled correctly
## Working with Excel Files
Excel files require a bit more finesse and the right licensing:
```sas
proc import datafile="C:/data/quarterly_sales.xlsx"
dbms=xlsx
out=work.sales_data
replace;
sheet="Q1_Results"; /* Specify which worksheet */
run;
```
> [!warning] License Requirement Excel import requires **SAS/ACCESS to PC Files**. If you don't have this license, you'll need to save your Excel files as CSV first.
### Excel vs CSV: When to Use Each
|Scenario|Best Choice|Why|
|---|---|---|
|Simple data, wide compatibility|CSV|Works with Base SAS, smaller files|
|Multiple worksheets in one file|Excel|Access specific sheets without conversion|
|Complex formatting, formulas|CSV|PROC IMPORT reads values, not formulas|
|Regular automated imports|CSV|More reliable, less licensing dependency|
## Controlling the Guessing Process
**The Challenge**: SAS looks at only the first 20 rows by default to determine data types. If those rows aren't representative, you might get incorrect types.
**The Solution**: Use GUESSINGROWS to be more thorough:
```sas
proc import datafile="messy_data.csv"
dbms=csv
out=work.clean_data
replace;
guessingrows=max; /* Examine ALL rows - most accurate */
run;
```
> [!tip] Strategic GUESSINGROWS Usage
>
> - **Small files**: Use `GUESSINGROWS=MAX` for perfect accuracy
> - **Large files**: Try `GUESSINGROWS=500` for better detection without huge performance hits
> - **Known clean data**: Stick with default (20) for speed
## Real-World Application: Storm Damage Analysis
> [!example]- Example Scenario
> Let's see this in action with a complete workflow:
>
> ```sas
> /* Step 1: Import the data */
> proc import datafile="storm_damage.csv"
> dbms=csv
> out=work.storms
> replace;
> guessingrows=50;
> run;
>
> /* Step 2: Examine what we imported */
> proc contents data=work.storms;
> run;
>
> /* Step 3: Quick data check */
> proc print data=work.storms (obs=10);
> run;
> ```
>
> **What happens here?**
>
> 1. External CSV becomes structured SAS dataset
> 2. We verify the structure matches our expectations
> 3. We preview the data to catch any import issues early
## PROC IMPORT vs XLSX Engine: Choosing Your Approach
Understanding when to use each method:
|Aspect|PROC IMPORT|XLSX Libname Engine|
|---|---|---|
|**Data Freshness**|Snapshot in time|Always current|
|**Performance**|Fast (one-time copy)|Slower (reads each time)|
|**Best for**|Data that doesn't change often|Live, frequently updated files|
|**Use Case**|"Import once, analyze many times"|"Always use latest version"|
```sas
/* PROC IMPORT - Creates a copy */
proc import datafile="sales.xlsx" dbms=xlsx out=work.sales;
run;
/* XLSX Engine - Direct access */
libname xl xlsx "sales.xlsx";
data work.sales;
set xl.sheet1; /* Always reads current file */
run;
```
## Troubleshooting Common Issues
> [!warning] Watch Out For These Gotchas
>
> **Wrong data types**: Increase GUESSINGROWS if SAS incorrectly guesses numeric vs character
>
> **Missing values**: Early rows with missing data can fool SAS into wrong type assumptions
>
> **Path problems**: Use forward slashes (/) or double backslashes (\) in file paths
>
> **Sheet names**: Excel sheet names are case-sensitive in the SHEET= option
## Connecting It All Together
PROC IMPORT is your data translator---it doesn't just move data, it intelligently converts external files into SAS-ready datasets by reading structure, guessing types, and handling formatting automatically. Creates a snapshot that becomes independent of the source file.
--
Reference:
- SAS Programming 1 β Essentials