<sub>2025-05-27</sub> <sub>#data-visualization #data-management #r-programming #hmp669</sub> <sup>[[maps-of-content|🌐 Maps of Content — All Notes]] </sup> <sup>Series: [[hmp669|HMP 669 — Data Management and Visualization]]</sup> <sup>Topic: [[hmp669#Data Visualization Using R|Data Visualization Using R]]</sup> # Working with Objects in R: Your Data Storage System > [!abstract]- Overview > > Objects in R are named containers where you store data---like labeled boxes that hold everything from single numbers to entire datasets. > > **Key Concepts**: > > - Objects store data temporarily during your R session > - Object shape determines type (1D vectors vs 2D data frames) > - Indexing with brackets `[]` and dollar signs `
accesses specific values > > **Critical Connections**: Vectors are building blocks → Data frames are collections of vectors → All data manipulation flows through object indexing > > **Must Remember**: Assignment operator `<-` creates objects, all vector elements must be same type, data frames allow mixed types across columns > [!info]- Package Requirements > > All the functions covered in this section uses base R functions that come pre-installed with R. No additional packages needed! > > [!code]- Syntax Reference > > > |Command/Syntax|Purpose|Example| > |---|---|---| > |**Assignment**||| > |`object <- value`|Create/update object|`x <- 5`| > |**Vector Creation**||| > |`c(...)`|Combine values into vector|`c(1, 2, 3)`| > |`factor(...)`|Create factor vector|`factor(c("A", "B"))`| > |**Vector Indexing**||| > |`vector[n]`|Get nth element|`x[3]`| > |`vector[start:end]`|Get range of elements|`x[2:5]`| > |`vector[condition]`|Get elements meeting condition|`x[x > 5]`| > |**Data Frame Indexing**||| > |`df[row, col]`|Get specific cell|`data[2, 3]`| > |`df[, col]`|Get entire column by position|`data[, 2]`| > |`df$column`|Get column by name|`data$age`| > |**Information**||| > |`class(object)`|Check object type|`class(x)`| > |`str(object)`|Show object structure|`str(data)`| --- ## Understanding the Foundation Think of R objects as **labeled storage containers** in a digital warehouse. Just as you might organize physical items in boxes with clear labels, R organizes data in objects with assigned names. These containers are temporary---they exist only while your R session is active, like items on a desk that get cleared away when you finish work. **Objects are where your data lives.** They can hold a single number, a list of names, an entire spreadsheet, or even the results of complex analyses. The beauty lies in their flexibility and the systematic way R organizes them. ## The Assignment Operator: Your Data Placement Tool The **assignment operator** `<-` is your primary tool for putting data into objects. Picture it as an arrow pointing left, showing the direction data flows: ```r new_object <- 5 ``` R reads this **from right to left** across the operator: "Take the value 5 and assign it to new_object." It's like placing an item in a labeled box---the contents (5) go into the container (new_object). ## Object Classification by Shape R categorizes objects based on their **dimensional structure**---essentially, how the data is arranged in space. ### Vectors: One-Dimensional Lines of Data **Vectors are ordered sequences** that stretch in one direction, like beads on a string. Each position matters, and every element must be the same type of data. ```r numbers <- c(6.2, 9.3, 4.1, 8.7) ``` Think of vectors as **specialized containers** that only accept one type of item---you can't mix numbers with text in the same vector. ![[r-objects-1748332599557.webp]] ### Data Frames: Two-Dimensional Spreadsheets **Data frames are like digital spreadsheets** with rows and columns. They're more flexible than their cousin (matrices) because each column can contain different types of data. ```r # A data frame can mix types across columns study_data <- data.frame( age = c(25, 30, 35), # numeric column name = c("Alice", "Bob", "Carol"), # character column enrolled = c(TRUE, TRUE, FALSE) # logical column ) ``` > [!note] Key Insight > Data frames are essentially **collections of vectors standing side by side**. Each column is a vector, but different columns can be different vector types. ![[r-objects-1748332650310.webp]] ## Vector Types by Content Vectors are further classified by **what kind of information they contain**: ### Numeric Vectors: Pure Numbers ```r temperatures <- c(98.6, 99.1, 97.8, 100.2) ``` ### Character Vectors: Text and Words ```r colleges <- c("Harvard", "MIT", "Stanford", "Berkeley") ``` **Most flexible** type---can contain any text, including spaces, punctuation, and mixed cases. Each value is wrapped in quotes. ### Factor Vectors: Controlled Categories ```r satisfaction <- factor(c("high", "medium", "low", "high")) ``` **More rigid than character** vectors, but useful for ensuring consistency. Like having a dropdown menu with preset options. ### Logical Vectors: True/False Questions ```r completed_survey <- c(TRUE, TRUE, FALSE, TRUE) ``` Perfect for **yes/no questions** or checking conditions. Can be abbreviated as `T` and `F`. ### Date Vectors: Calendar Information Specialized vectors for handling dates and times. > [!warning] Vector Rule > **All elements in a vector must be the same type.** Mixing types will force R to convert everything to the most flexible type (usually character). ![[r-objects-1748332723752.webp]] |**Type**|**Contains**|**Example**|**Key Feature**| |---|---|---|---| |**Numeric**|Numbers with decimals|`c(3.14, 2.7, 1.0)`|Mathematical operations| |**Integer**|Whole numbers only|`c(1L, 5L, 10L)`|No decimal points| |**Character**|Text in quotes|`c("red", "blue", "green")`|Most flexible type| |**Factor**|Limited, predefined options|`factor(c("low", "med", "high"))`|Controlled vocabulary| |**Logical**|TRUE/FALSE only|`c(TRUE, FALSE, TRUE)`|Yes/no questions| |**Date**|Calendar information|`as.Date("2024-01-15")`|Special time formatting| ## Indexing: Accessing Your Stored Data **Indexing is like giving R an address** to find specific data within your objects. You use square brackets `[]` as your navigation tool. ### Vector Indexing: Position-Based Access ```r # Get the 4th element temperatures[4] # Get elements 2 through 4 temperatures[2:4] # Get elements that meet a condition temperatures[temperatures > 99] ``` ### Data Frame Indexing: Row and Column Coordinates Since data frames have **two dimensions**, you need to specify both row and column positions, separated by a comma: ```r # Format: dataframe[row, column] study_data[2, 3] # Row 2, Column 3 study_data[, 2] # All rows, Column 2 study_data[1:2, ] # Rows 1-2, All columns ``` > [!tip] Memory Device > Think "**rows first, columns second**"---like reading a map where you find the street (row) before the house number (column). ### Column Access by Name: The Dollar Sign Method **More robust than position numbers**, the dollar sign `
lets you access columns by their names: ```r study_data$name # Gets the 'name' column study_data$age # Gets the 'age' column ``` This approach is **less fragile** than position numbers---if columns get rearranged, your code still works. ## Object Naming Rules: Your Labeling Guidelines While you have considerable freedom in naming objects, certain **boundaries keep R functioning smoothly**: ✅ **Allowed**: Letters, numbers (not at start), dots `.`, underscores `_` ❌ **Forbidden**: Spaces, special symbols (`!`, `+`, `#`), starting with numbers ```r # Good names participant_data study.results data2023 # Problematic names participant data # space 2023data # starts with number data! # special symbol ``` > [!warning] Function Override Risk > **Avoid naming objects after R functions** (like `mean`, `sum`, `data`). This overrides the function during your session, breaking expected functionality. ## Practical Application: Building Your First Data Structure > [!Example]- Create a simple dataset > > ```r > # Step 1: Create individual vectors > student_names <- c("Emma", "Liam", "Sophia", "Noah") > test_scores <- c(87, 92, 78, 95) > passed_exam <- c(TRUE, TRUE, FALSE, TRUE) > > # Step 2: Combine into a data frame > class_results <- data.frame( > name = student_names, > score = test_scores, > passed = passed_exam > ) > > # Step 3: Access specific information > class_results$name[1] # "Emma" > class_results[class_results$score > 85, ] # All students with scores > 85 > mean(class_results$score) # Average score: 88 > ``` > > > [!tip] Start with vectors (simple lists), then combine them into data frames (spreadsheet-like structures). This mirrors how you might organize information naturally. > ## Connecting the Concepts **Objects form a hierarchy of organization**: Individual values → Vectors → Data frames → Complex structures. This progression mirrors how we naturally organize information---from single facts to categorized lists to comprehensive databases. The **indexing system provides surgical precision** for data access, whether you need a single value or complex subsets. Combined with the assignment operator, you have complete control over data storage and retrieval.