In this tutorial, you will learn-
What is a Data Frame?
How to Create a Data Frame
Append a Column to Data Frame
Select a Column of a Data Frame
Subset a Data Frame
How to Create a Data Frame
We can create a dataframe in R by passing the variable a,b,c,d into the data.frame() function. We can R create dataframe and name the columns with name() and simply specify the name of the variables.
data.frame(df, stringsAsFactors = TRUE)
Arguments:
df: It can be a matrix to convert as a data frame or a collection of variables to join stringsAsFactors: Convert string to factor by default
Create a, b, c, d variables
a <- c(10,20,30,40) b <- c(‘book’, ‘pen’, ’textbook’, ‘pencil_case’) c <- c(TRUE,FALSE,TRUE,FALSE) d <- c(2.5, 8, 10, 7)
Join the variables to create a data frame
df <- data.frame(a,b,c,d) df
Output:
a b c d
1 10 book TRUE 2.5
2 20 pen FALSE 8.0
3 30 textbook TRUE 10.0
4 40 pencil_case FALSE 7.0
Name the data frame
names(df) <- c(‘ID’, ‘items’, ‘store’, ‘price’) df
Output:
ID items store price
1 10 book TRUE 2.5
2 20 pen FALSE 8.0
3 30 textbook TRUE 10.0
4 40 pencil_case FALSE 7.0
Print the structure
str(df)
‘data.frame’: 4 obs. of 4 variables:
$ ID : num 10 20 30 40
$ items: Factor w/ 4 levels “book”,“pen”,“pencil_case”,..: 1 2 4 3
$ store: logi TRUE FALSE TRUE FALSE
$ price: num 2.5 8 10 7
By default, data frame returns string variables as a factor.
Slice Data Frame
It is possible to SLICE values of a Data Frame. We select the rows and columns to return into bracket precede by the name of the data frame. A data frame is composed of rows and columns, df[A, B]. A represents the rows and B the columns. We can slice either by specifying the rows and/or columns. From picture 1, the left part represents the rows, and the right part is the columns. Note that the symbol : means to. For instance, 1:3 intends to select values from 1 to 3.
In below diagram we display how to access different selection of the data frame:
The yellow arrow selects the row 1 in column 2 The green arrow selects the rows 1 to 2 The red arrow selects the column 1 The blue arrow selects the rows 1 to 3 and columns 3 to 4
Note that, if we let the left part blank, R will select all the rows. By analogy, if we let the right part blank, R will select all the columns.
We can run the code in the console:
Select row 1 in column 2
df[1,2]
Output:
[1] book
Levels: book pen pencil_case textbook
Select Rows 1 to 2
df[1:2,]
Output:
ID items store price
1 10 book TRUE 2.5
2 20 pen FALSE 8.0
Select Columns 1
df[,1]
Output:
[1] 10 20 30 40
Select Rows 1 to 3 and columns 3 to 4
df[1:3, 3:4]
Output:
store price
1 TRUE 2.5
2 FALSE 8.0
3 TRUE 10.0
It is also possible to select the columns with their names. For instance, the code below extracts two columns: ID and store.
Slice with columns name
df[, c(‘ID’, ‘store’)]
Output:
ID store
1 10 TRUE
2 20 FALSE
3 30 TRUE
4 40 FALSE
Append a Column to Data Frame
You can also append a column to a Data Frame. You need to use the symbol $ to append dataframe R variable and add a column to a dataframe in R.
Create a new vector
quantity <- c(10, 35, 40, 5)
Add quantity
to the df
data frame
df$quantity <- quantity df
Output:
ID items store price quantity
1 10 book TRUE 2.5 10
2 20 pen FALSE 8.0 35
3 30 textbook TRUE 10.0 40
4 40 pencil_case FALSE 7.0 5
Note: The number of elements in the vector has to be equal to the no of elements in data frame. Executing the following statement to add column to dataframe R
quantity <- c(10, 35, 40)
Add quantity
to the df
data frame
df$quantity <- quantity
Gives error:
Error in `
lt;-.data.frame(
tmp`, quantity, value = c(10, 35, 40))
replacement has 3 rows, data has 4
Select a Column of a Data Frame
Sometimes, we need to store a column of a data frame for future use or perform operation on a column. We can use the $ sign to select the column from a data frame.
Select the column ID
df$ID Output:
[1] 1 2 3 4
Subset a Data Frame In the previous section, we selected an entire column without condition. It is possible to subset based on whether or not a certain condition was true. We use the subset() function. subset(x, condition) arguments:
- x: data frame used to perform the subset
- condition: define the conditional statement
We want to return only the items with price above 10, we can do:
Select price above 5
subset(df, subset = price > 5) Output: ID items store price 2 20 pen FALSE 8 3 30 textbook TRUE 10 4 40 pencil_case FALSE 7
You Might Like: How to Download & Install RStudio in Anaconda [Windows/Mac] Data Types in R with Example R While Loop with Programming Examples T-Test in R Programming: One Sample & Paired T-Test [Example] Histogram vs Bar Graph – Difference Between Them
Select a Column of a Data Frame
Sometimes, we need to store a column of a data frame for future use or perform operation on a column. We can use the $ sign to select the column from a data frame.
Select the column ID
df$ID
Output:
[1] 1 2 3 4
Subset a Data Frame
In the previous section, we selected an entire column without condition. It is possible to subset based on whether or not a certain condition was true. We use the subset() function.
subset(x, condition) arguments:
- x: data frame used to perform the subset
- condition: define the conditional statement
We want to return only the items with price above 10, we can do:
Select price above 5
subset(df, subset = price > 5)
Output:
ID items store price 2 20 pen FALSE 8 3 30 textbook TRUE 10 4 40 pencil_case FALSE 7