| SleepTime | |||
| Predictors | Estimates | std. Error | p | 
| (Intercept) | 9.73 | 0.25 | <0.001 | 
| StudyTime | -0.30 | 0.04 | <0.001 | 
| Rank [So] | -0.74 | 0.27 | 0.007 | 
| Rank [Jr] | -0.47 | 0.32 | 0.149 | 
| Rank [Sr] | 0.34 | 0.52 | 0.519 | 
| Observations | 359 | ||
| R2 / R2 adjusted | 0.263 / 0.255 | ||
Oct 2025
Suppose we have data we want to “work with”, and our data is arranged like the following:
 What can we do with this data?
| SleepTime | |||
| Predictors | Estimates | std. Error | p | 
| (Intercept) | 9.73 | 0.25 | <0.001 | 
| StudyTime | -0.30 | 0.04 | <0.001 | 
| Rank [So] | -0.74 | 0.27 | 0.007 | 
| Rank [Jr] | -0.47 | 0.32 | 0.149 | 
| Rank [Sr] | 0.34 | 0.52 | 0.519 | 
| Observations | 359 | ||
| R2 / R2 adjusted | 0.263 / 0.255 | ||
What we really want is to be able to understand our data and communicate our findings to others (in whatever format that make take):
And we want to do these things in a way that’s consistent, transparent, and repeatable:
library(haven)
library(sjPlot)
#' Read data from SPSS and apply labels
df <- haven::read_spss("./data/Sample_Dataset_2019.sav")
df$Rank <- factor(df$Rank, 
                  levels=1:4, 
                  labels=c("Fr", "So", "Jr", "Sr"))
#' Fit linear regression model
mylm <- lm(SleepTime ~ StudyTime + Rank, data=df)
#' Generate nice table with parameter estimates and p-values
sjPlot::tab_model(mylm)A programming language designed for statistical data analysis
An open source statistical software program
A community of data scientists and practitioners
Windows, Mac, Linux
Free!
R can do a lot, but there are new analytic methods coming out all the time. That’s where R packages – and the power of the R community – comes in.
Option 2: Run R/RStudio in the Cloud
Unlike other stat softwares, R syntax requires us to name both our dataset objects (called dataframes) and the variables inside those datasets (called vectors).
Here, we have a dataframe called professions, which contains two variables: job and salary. If we want to access the individual variables within the dataframe, we give the dataframe name followed by the $ operator, then the variable name:
For most actions, we will use functions: named “actions” that take some input(s) and return some output(s).
Some functions have more than one argument. Arguments are inputs to the function that change its behavior.
Tutorial sample data:
Research questions we’ll consider today:
read.table or read.csvWhat if the data isn’t in CSV or plaintext format?
readxlhaveninstall.packages(...) in the console (only need to do this once)library(...) function at the start of your R sessionHow do we know if our import was successful?
The following functions take a data frame as an argument, and return previews or summaries about that data frame:
View(mydata)str(mydata) (str is short for structure)summary(mydata)head(mydata)tail(mydata)names(mydata)When we need to access a column of a dataframe as a vector, we use the $ operator to extract it:
Normally when we want to access specific rows of a dataframe, what we really want is to filter our data by some condition. We can do this using function subset():
The first argument to subset is the name of the dataframe. The second argument is a logical condition for which rows to keep (here, Rank==1, i.e. keep only freshman).
If we instead want to drop or keep specific columns of our dataset, we can also use the subset function, but use its select argument:
mean(), sd(), min(), max(), median(), sum()
na.rm=TRUE argumenthist(), boxplot()table()addmargins()prop.table()barplot(table(...))cor()lm() setting  value
 version  R version 4.5.1 (2025-06-13 ucrt)
 os       Windows 11 x64 (build 26100)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2025-10-28
 pandoc   3.6.3 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
 quarto   NA @ C:\\Users\\kyeager4\\AppData\\Local\\Programs\\Quarto\\bin\\quarto.exe| package | ondiskversion | loadedversion | path | loadedpath | attached | is_base | date | source | md5ok | library | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| haven | haven | 2.5.5 | 2.5.5 | C:/Users/kyeager4/AppData/Local/R/win-library/4.5/haven | C:/Users/kyeager4/AppData/Local/R/win-library/4.5/haven | TRUE | FALSE | 2025-05-30 | CRAN (R 4.5.1) | TRUE | C:/Users/kyeager4/AppData/Local/R/win-library/4.5 |