<sub>2025-05-27</sub> <sub>#data-visualization #data-management #r-programming #hmp669</sub>
<sup>[[maps-of-content|🌐 Maps of Content — All Notes]] </sup>
<sup>Series: [[hmp669|HMP 669 — Data Management and Visualization]]</sup>
<sup>Topic: [[hmp669#Data Visualization Using R|Data Visualization Using R]]</sup>
# Working with Objects in R: Your Data Storage System
> [!abstract]- Overview
>
> Objects in R are named containers where you store data---like labeled boxes that hold everything from single numbers to entire datasets.
>
> **Key Concepts**:
>
> - Objects store data temporarily during your R session
> - Object shape determines type (1D vectors vs 2D data frames)
> - Indexing with brackets `[]` and dollar signs `
accesses specific values
>
> **Critical Connections**: Vectors are building blocks → Data frames are collections of vectors → All data manipulation flows through object indexing
>
> **Must Remember**: Assignment operator `<-` creates objects, all vector elements must be same type, data frames allow mixed types across columns
> [!info]- Package Requirements
>
> All the functions covered in this section uses base R functions that come pre-installed with R. No additional packages needed!
>
> [!code]- Syntax Reference
>
>
> |Command/Syntax|Purpose|Example|
> |---|---|---|
> |**Assignment**|||
> |`object <- value`|Create/update object|`x <- 5`|
> |**Vector Creation**|||
> |`c(...)`|Combine values into vector|`c(1, 2, 3)`|
> |`factor(...)`|Create factor vector|`factor(c("A", "B"))`|
> |**Vector Indexing**|||
> |`vector[n]`|Get nth element|`x[3]`|
> |`vector[start:end]`|Get range of elements|`x[2:5]`|
> |`vector[condition]`|Get elements meeting condition|`x[x > 5]`|
> |**Data Frame Indexing**|||
> |`df[row, col]`|Get specific cell|`data[2, 3]`|
> |`df[, col]`|Get entire column by position|`data[, 2]`|
> |`df$column`|Get column by name|`data$age`|
> |**Information**|||
> |`class(object)`|Check object type|`class(x)`|
> |`str(object)`|Show object structure|`str(data)`|
---
## Understanding the Foundation
Think of R objects as **labeled storage containers** in a digital warehouse. Just as you might organize physical items in boxes with clear labels, R organizes data in objects with assigned names. These containers are temporary---they exist only while your R session is active, like items on a desk that get cleared away when you finish work.
**Objects are where your data lives.** They can hold a single number, a list of names, an entire spreadsheet, or even the results of complex analyses. The beauty lies in their flexibility and the systematic way R organizes them.
## The Assignment Operator: Your Data Placement Tool
The **assignment operator** `<-` is your primary tool for putting data into objects. Picture it as an arrow pointing left, showing the direction data flows:
```r
new_object <- 5
```
R reads this **from right to left** across the operator: "Take the value 5 and assign it to new_object." It's like placing an item in a labeled box---the contents (5) go into the container (new_object).
## Object Classification by Shape
R categorizes objects based on their **dimensional structure**---essentially, how the data is arranged in space.
### Vectors: One-Dimensional Lines of Data
**Vectors are ordered sequences** that stretch in one direction, like beads on a string. Each position matters, and every element must be the same type of data.
```r
numbers <- c(6.2, 9.3, 4.1, 8.7)
```
Think of vectors as **specialized containers** that only accept one type of item---you can't mix numbers with text in the same vector.
![[r-objects-1748332599557.webp]]
### Data Frames: Two-Dimensional Spreadsheets
**Data frames are like digital spreadsheets** with rows and columns. They're more flexible than their cousin (matrices) because each column can contain different types of data.
```r
# A data frame can mix types across columns
study_data <- data.frame(
age = c(25, 30, 35), # numeric column
name = c("Alice", "Bob", "Carol"), # character column
enrolled = c(TRUE, TRUE, FALSE) # logical column
)
```
> [!note] Key Insight
> Data frames are essentially **collections of vectors standing side by side**. Each column is a vector, but different columns can be different vector types.
![[r-objects-1748332650310.webp]]
## Vector Types by Content
Vectors are further classified by **what kind of information they contain**:
### Numeric Vectors: Pure Numbers
```r
temperatures <- c(98.6, 99.1, 97.8, 100.2)
```
### Character Vectors: Text and Words
```r
colleges <- c("Harvard", "MIT", "Stanford", "Berkeley")
```
**Most flexible** type---can contain any text, including spaces, punctuation, and mixed cases. Each value is wrapped in quotes.
### Factor Vectors: Controlled Categories
```r
satisfaction <- factor(c("high", "medium", "low", "high"))
```
**More rigid than character** vectors, but useful for ensuring consistency. Like having a dropdown menu with preset options.
### Logical Vectors: True/False Questions
```r
completed_survey <- c(TRUE, TRUE, FALSE, TRUE)
```
Perfect for **yes/no questions** or checking conditions. Can be abbreviated as `T` and `F`.
### Date Vectors: Calendar Information
Specialized vectors for handling dates and times.
> [!warning] Vector Rule
> **All elements in a vector must be the same type.** Mixing types will force R to convert everything to the most flexible type (usually character).
![[r-objects-1748332723752.webp]]
|**Type**|**Contains**|**Example**|**Key Feature**|
|---|---|---|---|
|**Numeric**|Numbers with decimals|`c(3.14, 2.7, 1.0)`|Mathematical operations|
|**Integer**|Whole numbers only|`c(1L, 5L, 10L)`|No decimal points|
|**Character**|Text in quotes|`c("red", "blue", "green")`|Most flexible type|
|**Factor**|Limited, predefined options|`factor(c("low", "med", "high"))`|Controlled vocabulary|
|**Logical**|TRUE/FALSE only|`c(TRUE, FALSE, TRUE)`|Yes/no questions|
|**Date**|Calendar information|`as.Date("2024-01-15")`|Special time formatting|
## Indexing: Accessing Your Stored Data
**Indexing is like giving R an address** to find specific data within your objects. You use square brackets `[]` as your navigation tool.
### Vector Indexing: Position-Based Access
```r
# Get the 4th element
temperatures[4]
# Get elements 2 through 4
temperatures[2:4]
# Get elements that meet a condition
temperatures[temperatures > 99]
```
### Data Frame Indexing: Row and Column Coordinates
Since data frames have **two dimensions**, you need to specify both row and column positions, separated by a comma:
```r
# Format: dataframe[row, column]
study_data[2, 3] # Row 2, Column 3
study_data[, 2] # All rows, Column 2
study_data[1:2, ] # Rows 1-2, All columns
```
> [!tip] Memory Device
> Think "**rows first, columns second**"---like reading a map where you find the street (row) before the house number (column).
### Column Access by Name: The Dollar Sign Method
**More robust than position numbers**, the dollar sign `
lets you access columns by their names:
```r
study_data$name # Gets the 'name' column
study_data$age # Gets the 'age' column
```
This approach is **less fragile** than position numbers---if columns get rearranged, your code still works.
## Object Naming Rules: Your Labeling Guidelines
While you have considerable freedom in naming objects, certain **boundaries keep R functioning smoothly**:
✅ **Allowed**: Letters, numbers (not at start), dots `.`, underscores `_`
❌ **Forbidden**: Spaces, special symbols (`!`, `+`, `#`), starting with numbers
```r
# Good names
participant_data
study.results
data2023
# Problematic names
participant data # space
2023data # starts with number
data! # special symbol
```
> [!warning] Function Override Risk
> **Avoid naming objects after R functions** (like `mean`, `sum`, `data`). This overrides the function during your session, breaking expected functionality.
## Practical Application: Building Your First Data Structure
> [!Example]- Create a simple dataset
>
> ```r
> # Step 1: Create individual vectors
> student_names <- c("Emma", "Liam", "Sophia", "Noah")
> test_scores <- c(87, 92, 78, 95)
> passed_exam <- c(TRUE, TRUE, FALSE, TRUE)
>
> # Step 2: Combine into a data frame
> class_results <- data.frame(
> name = student_names,
> score = test_scores,
> passed = passed_exam
> )
>
> # Step 3: Access specific information
> class_results$name[1] # "Emma"
> class_results[class_results$score > 85, ] # All students with scores > 85
> mean(class_results$score) # Average score: 88
> ```
>
> > [!tip] Start with vectors (simple lists), then combine them into data frames (spreadsheet-like structures). This mirrors how you might organize information naturally.
>
## Connecting the Concepts
**Objects form a hierarchy of organization**: Individual values → Vectors → Data frames → Complex structures. This progression mirrors how we naturally organize information---from single facts to categorized lists to comprehensive databases.
The **indexing system provides surgical precision** for data access, whether you need a single value or complex subsets. Combined with the assignment operator, you have complete control over data storage and retrieval.