<sub>2025-06-01</sub> <sub>#data-science #sas-programming </sub> <sup>[[maps-of-content|🌐 Maps of Content β€” All Notes]] </sup> <sup>Series: [[sas-programming-1-essentials|SAS Programming 1 β€” Essentials]]</sup> <sup>Topic: [[sas-programming-1-essentials#Lesson 2 Accessing Data|Lesson 2: Accessing Data]]</sup> # SAS Data Import > [!abstract]- Overview > > PROC IMPORT transforms external, unstructured data (CSV, Excel) into structured SAS datasets you can analyze and manipulate. > > **Key Concepts**: > > - **Source Specification**: Tell SAS where your data lives (DATAFILE, DBMS) > - **Destination Control**: Define your SAS output table (OUT, REPLACE) > - **Smart Detection**: Let SAS guess data types or guide the process (GUESSINGROWS) > > **Critical Connections**: External files β†’ PROC IMPORT β†’ SAS datasets β†’ Analysis ready > > **Must Remember**: PROC IMPORT creates a _snapshot_ of your external data, not a live connection > [!code]- Syntax Reference > > > |Command/Syntax|Purpose|Example| > |---|---|---| > |**Basic Structure**||| > |`PROC IMPORT DATAFILE="path/file" DBMS=type OUT=table;`|Core import syntax|`PROC IMPORT DATAFILE="data.csv" DBMS=CSV OUT=mydata;`| > |**Essential Options**||| > |`DATAFILE="path"`|Specifies source file location|`DATAFILE="C:/data/sales.csv"`| > |`DBMS=CSV`|Defines file type (CSV included with Base SAS)|`DBMS=CSV`| > |`DBMS=XLSX`|Excel files (requires SAS/ACCESS license)|`DBMS=XLSX`| > |`OUT=library.tablename`|Names output SAS dataset|`OUT=work.imported_data`| > |**Control Options**||| > |`REPLACE`|Overwrites existing output table|`OUT=mydata REPLACE;`| > |`GUESSINGROWS=n`|Rows to scan for data types (default=20)|`GUESSINGROWS=100` or `GUESSINGROWS=MAX`| > |`SHEET="name"`|Excel worksheet to import|`SHEET="Sales_Data";`| ## The Data Import Challenge **Imagine this scenario**: You have a treasure trove of data sitting in Excel spreadsheets and CSV files, but SAS can only work its analytical magic on structured SAS datasets. It's like having a key that doesn't fit the lock---you need a translator. **That translator is PROC IMPORT**---your bridge between the external data world and SAS's structured universe. ## Understanding the Core Workflow The import process follows a simple but powerful pattern: ```plaintext External File β†’ PROC IMPORT β†’ SAS Dataset β†’ Ready for Analysis ↓ ↓ ↓ ↓ CSV/Excel Reads & Maps Structured Reports, Models, Files Data Types SAS Table Visualizations ``` > [!tip] Think of PROC IMPORT as a Smart Translator > It doesn't just copy data---it reads your external file, makes intelligent guesses about data types, converts formats (like dates), and creates a proper SAS dataset that follows all the rules SAS expects. ## Essential Building Blocks ### The Three Pillars of PROC IMPORT Every PROC IMPORT statement needs these three foundation pieces: **1. DATAFILE**: _Where is your data?_ ```sas DATAFILE="C:/Users/YourName/Documents/sales_data.csv" ``` **2. DBMS**: _What type of file is it?_ ```sas DBMS=CSV /* For comma-delimited files */ DBMS=XLSX /* For Excel files */ ``` **3. OUT**: _What should we call the SAS table?_ ```sas OUT=work.sales_data /* Creates temporary table */ OUT=mylib.sales_data /* Creates permanent table */ ``` ### Smart Defaults That Work For You SAS makes reasonable assumptions to get you started quickly: - **Column names** come from the first row - **Data types** determined by scanning the first 20 rows - **Dates** automatically converted to SAS date values - **Character data** with commas gets properly handled (when quoted) ## Working with CSV Files CSV (Comma-Separated Values) files are the workhorses of data exchange. Here's your standard approach: ```sas proc import datafile="storm_damage.csv" dbms=csv out=work.storm_data replace; run; ``` > [!note] Why REPLACE Matters > Without REPLACE, SAS will throw an error if your output table already exists. Adding REPLACE lets you re-run your code without manual cleanup---perfect for iterative data work. **Real Example Breakdown**: ```sas proc import datafile="C:/data/storm_damage.csv" dbms=csv out=work.storm_damage_import replace; guessingrows=100; /* Look at more rows for better type detection */ run; ``` This creates a SAS dataset where: - Numeric columns (like Cost, Date) become SAS numeric variables - Text columns (like Event, Summary) become character variables - Quoted values with commas are handled correctly ## Working with Excel Files Excel files require a bit more finesse and the right licensing: ```sas proc import datafile="C:/data/quarterly_sales.xlsx" dbms=xlsx out=work.sales_data replace; sheet="Q1_Results"; /* Specify which worksheet */ run; ``` > [!warning] License Requirement Excel import requires **SAS/ACCESS to PC Files**. If you don't have this license, you'll need to save your Excel files as CSV first. ### Excel vs CSV: When to Use Each |Scenario|Best Choice|Why| |---|---|---| |Simple data, wide compatibility|CSV|Works with Base SAS, smaller files| |Multiple worksheets in one file|Excel|Access specific sheets without conversion| |Complex formatting, formulas|CSV|PROC IMPORT reads values, not formulas| |Regular automated imports|CSV|More reliable, less licensing dependency| ## Controlling the Guessing Process **The Challenge**: SAS looks at only the first 20 rows by default to determine data types. If those rows aren't representative, you might get incorrect types. **The Solution**: Use GUESSINGROWS to be more thorough: ```sas proc import datafile="messy_data.csv" dbms=csv out=work.clean_data replace; guessingrows=max; /* Examine ALL rows - most accurate */ run; ``` > [!tip] Strategic GUESSINGROWS Usage > > - **Small files**: Use `GUESSINGROWS=MAX` for perfect accuracy > - **Large files**: Try `GUESSINGROWS=500` for better detection without huge performance hits > - **Known clean data**: Stick with default (20) for speed ## Real-World Application: Storm Damage Analysis > [!example]- Example Scenario > Let's see this in action with a complete workflow: > > ```sas > /* Step 1: Import the data */ > proc import datafile="storm_damage.csv" > dbms=csv > out=work.storms > replace; > guessingrows=50; > run; > > /* Step 2: Examine what we imported */ > proc contents data=work.storms; > run; > > /* Step 3: Quick data check */ > proc print data=work.storms (obs=10); > run; > ``` > > **What happens here?** > > 1. External CSV becomes structured SAS dataset > 2. We verify the structure matches our expectations > 3. We preview the data to catch any import issues early ## PROC IMPORT vs XLSX Engine: Choosing Your Approach Understanding when to use each method: |Aspect|PROC IMPORT|XLSX Libname Engine| |---|---|---| |**Data Freshness**|Snapshot in time|Always current| |**Performance**|Fast (one-time copy)|Slower (reads each time)| |**Best for**|Data that doesn't change often|Live, frequently updated files| |**Use Case**|"Import once, analyze many times"|"Always use latest version"| ```sas /* PROC IMPORT - Creates a copy */ proc import datafile="sales.xlsx" dbms=xlsx out=work.sales; run; /* XLSX Engine - Direct access */ libname xl xlsx "sales.xlsx"; data work.sales; set xl.sheet1; /* Always reads current file */ run; ``` ## Troubleshooting Common Issues > [!warning] Watch Out For These Gotchas > > **Wrong data types**: Increase GUESSINGROWS if SAS incorrectly guesses numeric vs character > > **Missing values**: Early rows with missing data can fool SAS into wrong type assumptions > > **Path problems**: Use forward slashes (/) or double backslashes (\) in file paths > > **Sheet names**: Excel sheet names are case-sensitive in the SHEET= option ## Connecting It All Together PROC IMPORT is your data translator---it doesn't just move data, it intelligently converts external files into SAS-ready datasets by reading structure, guessing types, and handling formatting automatically. Creates a snapshot that becomes independent of the source file. -- Reference: - SAS Programming 1 β€” Essentials