data.table vs. base vs. dplyr
May 23, 2025
This page presents a side-by-side comparison of common data manipulation operations in R in three idioms: data.table, base, and dplyr. This allows you to compare syntax and understand how to accomplish tasks across these popular frameworks.
This reference guide covers everything from basic filtering and sorting to advanced operations like joins and reshaping data. Many of these examples were originally crafted by Atrebas. They were then reorganized and augmented with base examples by a team of contributors.
To begin, we create example data. The base R data frame is called DF, the data.table table is called DT, and the dplyr tibble is called TB. Data creation is wrapped in a refresh_data() function, which is called periodically throughout the document to ensure that the data is reset after modifications.
{
DT <<-
DF <<-
TB <<-
}
When using the let() and set*() functions or := operator modifies a data.table βin place,β which means that it does not copy the object at all. This is more efficient than re-assigning the entire data set. However, when modified in place, the data table will not be printed immediately to the console after modification. You must call the object again to see the changes.
Filter
Integer index
Positive indices keep the specified rows.
DT
> V1 V2 V3 V4
>
> 1: 1 3 1.5 C
> 2: 2 4 0.5 A
Negative indices exclude the specified rows.
DT
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 2 1.0 B
> 3: 2 8 1.0 B
> 4: 1 9 1.5 C
Logical index
Filter rows using a logical vector. Keep the rows where the condition is TRUE. %chin% is a fast version of %in%, optimized for strings. %like% is a convenient operator for regular expression matches.
DT
> V1 V2 V3 V4
>
> 1: 2 6 1.5 C
> 2: 1 7 0.5 A
> 3: 2 8 1.0 B
> 4: 1 9 1.5 C
DT # faster than %in% for strings
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 1 3 1.5 C
> 3: 2 4 0.5 A
> 4: 2 6 1.5 C
> 5: 1 7 0.5 A
> 6: 1 9 1.5 C
DT # regular expressions
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 1 3 1.5 C
> 3: 2 4 0.5 A
> 4: 2 6 1.5 C
> 5: 1 7 0.5 A
> 6: 1 9 1.5 C
Filter rows based on multiple conditions.
DT
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 1 7 0.5 A
Unique
> Indices: ,
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 2 1.0 B
> 3: 1 3 1.5 C
> 4: 2 4 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
> 7: 1 7 0.5 A
> 8: 2 8 1.0 B
> 9: 1 9 1.5 C
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 2 1.0 B
> 3: 1 3 1.5 C
> 4: 2 4 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
Missing values
Drop rows with missing values in specified columns.
> Indices: ,
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 2 1.0 B
> 3: 1 3 1.5 C
> 4: 2 4 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
> 7: 1 7 0.5 A
> 8: 2 8 1.0 B
> 9: 1 9 1.5 C
Random sample
Draw a random sample of rows.
DT
> V1 V2 V3 V4
>
> 1: 1 7 0.5 A
> 2: 1 3 1.5 C
> 3: 2 4 0.5 A
DT
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 1 5 1.0 B
> 3: 2 6 1.5 C
> 4: 1 7 0.5 A
Other
DT
DT
DT
Sort
Rows
Sort rows in ascending order.
DT
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 4 0.5 A
> 3: 1 7 0.5 A
> 4: 2 2 1.0 B
> 5: 1 5 1.0 B
> 6: 2 8 1.0 B
> 7: 1 3 1.5 C
> 8: 2 6 1.5 C
> 9: 1 9 1.5 C
Sort rows in decreasing order.
DT
> V1 V2 V3 V4
>
> 1: 1 3 1.5 C
> 2: 2 6 1.5 C
> 3: 1 9 1.5 C
> 4: 2 2 1.0 B
> 5: 1 5 1.0 B
> 6: 2 8 1.0 B
> 7: 1 1 0.5 A
> 8: 2 4 0.5 A
> 9: 1 7 0.5 A
Sort rows by multiple columns.
DT
> V1 V2 V3 V4
>
> 1: 1 9 1.5 C
> 2: 1 7 0.5 A
> 3: 1 5 1.0 B
> 4: 1 3 1.5 C
> 5: 1 1 0.5 A
> 6: 2 8 1.0 B
> 7: 2 6 1.5 C
> 8: 2 4 0.5 A
> 9: 2 2 1.0 B
Sort dataset and the column passed in argument becomes key. Output is an object the same type as the object indexed. This code is not executed to avoid sorting the original data set in place.
Columns
Reorders the columns of a dataset. This code is not executed to avoid reordering columns in the original data set.
DT
> V4 V1 V2 V3
>
> 1: A 1 1 0.5
> 2: B 2 2 1.0
> 3: C 1 3 1.5
> 4: A 2 4 0.5
> 5: B 1 5 1.0
> 6: C 2 6 1.5
> 7: A 1 7 0.5
> 8: B 2 8 1.0
> 9: C 1 9 1.5
Select
Keep
Extract one column as a vector.
# DT[["V3"]]
# DT[, V3]
DT
> [1] 0.5 1.0 1.5 0.5 1.0 1.5 0.5 1.0 1.5
Extract one column as a data frame.
# DT[, .SD, .SDcols = "V3"]
DT
> V3
>
> 1: 0.5
> 2: 1.0
> 3: 1.5
> 4: 0.5
> 5: 1.0
> 6: 1.5
> 7: 0.5
> 8: 1.0
> 9: 1.5
Select several columns by column names.
DT
DT
DT
DT
cols =
DT
Drop
Exclude several columns by column name.
# DT[, .SD, .SDcols = !c("V2", "V3")]
DT
> V1 V4
>
> 1: 1 A
> 2: 2 B
> 3: 1 C
> 4: 2 A
> 5: 1 B
> 6: 2 C
> 7: 1 A
> 8: 2 B
> 9: 1 C
Remove a column from the data set. Using let() is efficient because it modifies the data set in place. This code is not executed because the V5 column is not present in the data set.
DT
> V2 V3 V4
>
> 1: 1 0.5 A
> 2: 2 1.0 B
> 3: 3 1.5 C
> 4: 4 0.5 A
> 5: 5 1.0 B
> 6: 6 1.5 C
> 7: 7 0.5 A
> 8: 8 1.0 B
> 9: 9 1.5 C
DT
> V2 V3 V4
>
> 1: 1 0.5 A
> 2: 2 1.0 B
> 3: 3 1.5 C
> 4: 4 0.5 A
> 5: 5 1.0 B
> 6: 6 1.5 C
> 7: 7 0.5 A
> 8: 8 1.0 B
> 9: 9 1.5 C
Remove several columns from the data set. Using := is efficient because it modifies the data set in place. This code is not executed because the V6 and V7 columns are not present in the data set.
cols =
DT
> V4
>
> 1: A
> 2: B
> 3: C
> 4: A
> 5: B
> 6: C
> 7: A
> 8: B
> 9: C
DT
> V4
>
> 1: A
> 2: B
> 3: C
> 4: A
> 5: B
> 6: C
> 7: A
> 8: B
> 9: C
Rename
Select and rename.
DT
> X1 X2
>
> 1: 1 1
> 2: 2 2
> 3: 1 3
> 4: 2 4
> 5: 1 5
> 6: 2 6
> 7: 1 7
> 8: 2 8
> 9: 1 9
Using the data.table::setnames() to rename columns is efficient because it renames column in place. This code is not executed to avoid renaming columns in the original data set.
DT
> X1 X2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 2 1.0 B
> 3: 1 3 1.5 C
> 4: 2 4 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
> 7: 1 7 0.5 A
> 8: 2 8 1.0 B
> 9: 1 9 1.5 C
Advanced selections
Complex selections using regular expressions or dedicated functions.
DT
DT
DT
DT
DT
DT
DT
Summarize
Single columns
Create a new data frame with a single row and a single column, summarizing the information of one column. Named or unnamed results.
DT
> [1] 13
DT
> sumV1
>
> 1: 13
Create a new data frame with a single row and two columns, summarizing the information of two manually specified columns.
DT
> sumV1 sdV3
>
> 1: 13 0.4330127
Multiple columns
Apply a function to each column.
DT
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
Apply a function to each column that matches a specific type.
DT
> V1 V2 V3
>
> 1: 1.444444 5 1
By group
Count the number of observation by group.
DT
> V4 N
>
> 1: A 3
> 2: B 3
> 3: C 3
Multiple named summaries
DT
> V4 nobs meanV1
>
> 1: A 3 1.333333
> 2: B 3 1.666667
> 3: C 3 1.333333
Apply a function to the full data frame in each group. Here, we return the first row in each group using the head() function.
DT
> V4 V1 V2 V3
>
> 1: A 1 1 0.5
> 2: B 2 2 1.0
> 3: C 1 3 1.5
Get the row number of first (and last) observation by group. In data.table, we use the .I operator, which reports the row number.
DT
> V4 V1
>
> 1: A 1
> 2: B 2
> 3: C 3
DT
> V4 V1
>
> 1: A 1
> 2: A 7
> 3: B 2
> 4: B 8
> 5: C 3
> 6: C 9
List-columns are columns where each element is a vector, data frame, or other object.
DT
> V4 V1
>
> 1: A 1,2,1
> 2: B 2,1,2
> 3: C 1,2,1
DT
> V4 V1
>
> 1: A
> 2: B
> 3: C
Modify
Single columns
Create or modify a column.
DT
> V1 V2 V3 V4 V5
>
> 1: 1 1 0.5 A 1
> 2: 2 2 1.0 B 4
> 3: 1 3 1.5 C 1
> 4: 2 4 0.5 A 4
> 5: 1 5 1.0 B 1
> 6: 2 6 1.5 C 4
> 7: 1 7 0.5 A 1
> 8: 2 8 1.0 B 4
> 9: 1 9 1.5 C 1
DT
> V1 V2 V3 V4 V5
>
> 1: 1 1 0.5 A 1
> 2: 2 2 1.0 B 4
> 3: 1 3 1.5 C 1
> 4: 2 4 0.5 A 4
> 5: 1 5 1.0 B 1
> 6: 2 6 1.5 C 4
> 7: 1 7 0.5 A 1
> 8: 2 8 1.0 B 4
> 9: 1 9 1.5 C 1
Create several new columns, each named explicitly.
DT
> V1 V2 V3 V4 V5 V6
>
> 1: 1 1 0.5 A 1.000000 X
> 2: 2 2 1.0 B 1.414214 X
> 3: 1 3 1.5 C 1.000000 X
> 4: 2 4 0.5 A 1.414214 X
> 5: 1 5 1.0 B 1.000000 X
> 6: 2 6 1.5 C 1.414214 X
> 7: 1 7 0.5 A 1.000000 X
> 8: 2 8 1.0 B 1.414214 X
> 9: 1 9 1.5 C 1.000000 X
DT
> V1 V2 V3 V4 V5 V6
>
> 1: 1 1 0.5 A 1.000000 X
> 2: 2 2 1.0 B 1.414214 X
> 3: 1 3 1.5 C 1.000000 X
> 4: 2 4 0.5 A 1.414214 X
> 5: 1 5 1.0 B 1.000000 X
> 6: 2 6 1.5 C 1.414214 X
> 7: 1 7 0.5 A 1.000000 X
> 8: 2 8 1.0 B 1.414214 X
> 9: 1 9 1.5 C 1.000000 X
Multiple columns
Take the square of every numeric column.
DT
> V1 V2 V3 V4 V5 V6
>
> 1: 1 1 0.25 A 1 X
> 2: 4 4 1.00 B 2 X
> 3: 1 9 2.25 C 1 X
> 4: 4 16 0.25 A 2 X
> 5: 1 25 1.00 B 1 X
> 6: 4 36 2.25 C 2 X
> 7: 1 49 0.25 A 1 X
> 8: 4 64 1.00 B 2 X
> 9: 1 81 2.25 C 1 X
DT
> V1 V2 V3 V4 V5 V6
>
> 1: 1 1 0.25 A 1 X
> 2: 4 4 1.00 B 2 X
> 3: 1 9 2.25 C 1 X
> 4: 4 16 0.25 A 2 X
> 5: 1 25 1.00 B 1 X
> 6: 4 36 2.25 C 2 X
> 7: 1 49 0.25 A 1 X
> 8: 4 64 1.00 B 2 X
> 9: 1 81 2.25 C 1 X
Replace
Replace values in rows that match a condition applied to a column.
DT
> V1 V2 V3 V4
>
> 1: 1 0 0.5 A
> 2: 2 0 1.0 B
> 3: 1 0 1.5 C
> 4: 2 0 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
> 7: 1 7 0.5 A
> 8: 2 8 1.0 B
> 9: 1 9 1.5 C
DT
> V1 V2 V3 V4
>
> 1: 1 0 0.5 A
> 2: 2 0 1.0 B
> 3: 1 0 1.5 C
> 4: 2 0 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
> 7: 1 7 0.5 A
> 8: 2 8 1.0 B
> 9: 1 9 1.5 C
If, else, case
Fast version of ifelse(). Return a vector the same length as the vector tested, with specific value for in case where the condition is TRUE, where it is FALSE, and where it is NA.
x =
> [1] "neg" "neg" "neg" "pos" "pos" "pos" "pos" "NA"
Recode several cases at once, based on a vector of conditions.
x = 1:10
> [1] "1" "fizz" "buzz" "fizz" "5" "fizz buzz"
> [7] "7" "fizz" "buzz" "fizz"
Modify by group
Group summaries of a column in categories.
DT
> V4 sumV2
>
> 1: A 12
> 2: B 15
> 3: C 18
Group values of a column in groups while applying a function to each category.
DT
> tolower sumV1
>
> 1: a 4
> 2: b 5
> 3: c 4
Group values of a column in two categories, TRUE (for rows matching the condition) and FALSE (For rows not matching the condition).
DT
> Key:
> V4 V1
>
> 1: FALSE 9
> 2: TRUE 4
Group values of a column in several categories with some of the rows of the initial dataset removed.
DT
> V4 sumV1
>
> 1: A 3
> 2: B 3
> 3: C 1
Add a new column with the number of observations per group.
DT
> V1 V2 V3 V4 n
>
> 1: 1 1 0.5 A 5
> 2: 2 2 1.0 B 4
> 3: 1 3 1.5 C 5
> 4: 2 4 0.5 A 4
> 5: 1 5 1.0 B 5
> 6: 2 6 1.5 C 4
> 7: 1 7 0.5 A 5
> 8: 2 8 1.0 B 4
> 9: 1 9 1.5 C 5
DT
> V1 V2 V3 V4 n
>
> 1: 1 1 0.5 A 5
> 2: 2 2 1.0 B 4
> 3: 1 3 1.5 C 5
> 4: 2 4 0.5 A 4
> 5: 1 5 1.0 B 5
> 6: 2 6 1.5 C 4
> 7: 1 7 0.5 A 5
> 8: 2 8 1.0 B 4
> 9: 1 9 1.5 C 5
Advanced
Summarise all the columns, typically using an aggregation function.
DT
> V1 V2 V3 V4
>
> 1: 2 9 1.5 C
Summarise several columns, typically using an aggregation function.
DT
> V1 V2
>
> 1: 1.444444 5
Summarise several columns by groups, typically using an aggregation function.
DT
> V4 V1 V2
>
> 1: A 1.333333 4
> 2: B 1.666667 5
> 3: C 1.333333 6
DT
> V4 V1 V2
>
> 1: A 1.333333 4
> 2: B 1.666667 5
> 3: C 1.333333 6
Summarise several columns by group using multiple aggregation functions, grouping by one or more variables.
DT
> V4 V1 V2 V3 V1 V2 V3
>
> 1: A 4 12 1.5 1.333333 4 0.5
> 2: B 5 15 3.0 1.666667 5 1.0
> 3: C 4 18 4.5 1.333333 6 1.5
Summarise a subset of columns by column type or condition.
DT
> V1 V2 V3
>
> 1: 1.444444 5 1
{ && > 3}
DT
> V2
>
> 1: 5
Modify all the columns using the same function.
DT
> V1 V2 V3 V4
>
> 1: 1 9 1.5 C
> 2: 2 8 1.0 B
> 3: 1 7 0.5 A
> 4: 2 6 1.5 C
> 5: 1 5 1.0 B
> 6: 2 4 0.5 A
> 7: 1 3 1.5 C
> 8: 2 2 1.0 B
> 9: 1 1 0.5 A
Apply a transformation to each element of the variables selected.
DT
> V1 V2
>
> 1: 1.000000 1.000000
> 2: 1.414214 1.414214
> 3: 1.000000 1.732051
> 4: 1.414214 2.000000
> 5: 1.000000 2.236068
> 6: 1.414214 2.449490
> 7: 1.000000 2.645751
> 8: 1.414214 2.828427
> 9: 1.000000 3.000000
Apply a transformation for each element of the variables selected.
DT
> V1 V2 V3 V4
>
> 1: 1.000000 1.000000 0.7071068 A
> 2: 1.414214 1.414214 1.0000000 B
> 3: 1.000000 1.732051 1.2247449 C
> 4: 1.414214 2.000000 0.7071068 A
> 5: 1.000000 2.236068 1.0000000 B
> 6: 1.414214 2.449490 1.2247449 C
> 7: 1.000000 2.645751 0.7071068 A
> 8: 1.414214 2.828427 1.0000000 B
> 9: 1.000000 3.000000 1.2247449 C
DT
> V1 V2 V3 V4
>
> 1: 1.000000 1.000000 0.7071068 A
> 2: 1.414214 1.414214 1.0000000 B
> 3: 1.000000 1.732051 1.2247449 C
> 4: 1.414214 2.000000 0.7071068 A
> 5: 1.000000 2.236068 1.0000000 B
> 6: 1.414214 2.449490 1.2247449 C
> 7: 1.000000 2.645751 0.7071068 A
> 8: 1.414214 2.828427 1.0000000 B
> 9: 1.000000 3.000000 1.2247449 C
Apply a transformation to each element of the variables selected by condition.
DT
> V1 V2 V3
>
> 1: 0.0000000 0.0000000 -0.2928932
> 2: 0.4142136 0.4142136 0.0000000
> 3: 0.0000000 0.7320508 0.2247449
> 4: 0.4142136 1.0000000 -0.2928932
> 5: 0.0000000 1.2360680 0.0000000
> 6: 0.4142136 1.4494897 0.2247449
> 7: 0.0000000 1.6457513 -0.2928932
> 8: 0.4142136 1.8284271 0.0000000
> 9: 0.0000000 2.0000000 0.2247449
Apply a transformation to each element of the variables selected by condition.
DT
> V1 V2 V3 V4
>
> 1: 1 1 0 A
> 2: 1 1 1 B
> 3: 1 1 1 C
> 4: 1 2 0 A
> 5: 1 2 1 B
> 6: 1 2 1 C
> 7: 1 2 0 A
> 8: 1 2 1 B
> 9: 1 3 1 C
DT
> V1 V2 V3 V4
>
> 1: 1 1 0 A
> 2: 1 1 1 B
> 3: 1 1 1 C
> 4: 1 2 0 A
> 5: 1 2 1 B
> 6: 1 2 1 C
> 7: 1 2 0 A
> 8: 1 2 1 B
> 9: 1 3 1 C
Combine multiple functions in a single statement.
DT
> V4 V1 V2
>
> 1: A 1 X
> 2: A 1 X
> 3: B 1 X
> 4: B 1 X
> 5: C 1 X
> 6: C 1 X
Chain (pipe)
Expression chaining allows you to perform multiple operations in sequence without creating intermediate objects.
# Chain operations using [][]
DT
> V4 V1sum
>
> 1: B 5
# Sort results in descending order
DT
> V4 V1sum
>
> 1: B 5
> 2: A 4
> 3: C 4
Join
First, letβs create example datasets for demonstrating joins.
x =
y =
Basic Joins
There are two basic ways to perform joins in data.table. First, we can use the standard merge() function that should be familiar to most R users.
Second, we can use the square bracket notation with the on argument. Here, the logic is that we select every row from the table outside that match rows from the table inside the bracket. The table inside the brackets is used as an βindexβ, so x[y] essentially means that we are merging x into y. The benefit of this syntax is that we can efficiently combine it with other operations like summaries, sorts, etc.
Note that, in data.table, the on argument specifies the joining columns, and the i. prefix refers to columns from the right table, while the x. prefix refers to columns from the left table.
# Left join (keep all rows from x)
# merge(x, y, all.x = TRUE, by = "Id")
y
> Id Y1 XY X1 i.XY
>
> 1: A 1 y1 1 x2
> 2: B 3 y3 3 x4
> 3: B 5 y5 3 x4
> 4: C NA 5 x6
> 5: C NA 7 x8
# Right join (keep all rows from y)
# merge(x, y, all.y = TRUE, by = "Id")
x
> Id X1 XY Y1 i.XY
>
> 1: A 1 x2 1 y1
> 2: B 3 x4 3 y3
> 3: B 3 x4 5 y5
> 4: D NA 7 y7
# Inner join (keep only matching rows)
# merge(x, y, by = "Id"
x
> Id X1 XY Y1 i.XY
>
> 1: A 1 x2 1 y1
> 2: B 3 x4 3 y3
> 3: B 3 x4 5 y5
# Anti join (keep rows from x with no match in y)
x
> Id X1 XY
>
> 1: C 5 x6
> 2: C 7 x8
# Full join (keep all rows)
> Key:
> Id X1 XY.x Y1 XY.y
>
> 1: A 1 x2 1 y1
> 2: B 3 x4 3 y3
> 3: B 3 x4 5 y5
> 4: C 5 x6 NA
> 5: C 7 x8 NA
> 6: D NA 7 y7
Reshape
Wide to long
Converting data from wide format (where variables are in columns) to long format (where variables become rows).
> V4 Variable Value
>
> 1: A V1 1.0
> 2: B V1 2.0
> 3: C V1 1.0
> 4: A V1 2.0
> 5: B V1 1.0
> 6: C V1 2.0
> 7: A V1 1.0
> 8: B V1 2.0
> 9: C V1 1.0
> 10: A V2 1.0
> 11: B V2 2.0
> 12: C V2 3.0
> 13: A V2 4.0
> 14: B V2 5.0
> 15: C V2 6.0
> 16: A V2 7.0
> 17: B V2 8.0
> 18: C V2 9.0
> 19: A V3 0.5
> 20: B V3 1.0
> 21: C V3 1.5
> 22: A V3 0.5
> 23: B V3 1.0
> 24: C V3 1.5
> 25: A V3 0.5
> 26: B V3 1.0
> 27: C V3 1.5
> V4 Variable Value
>
'measure.vars' [V1, V2, V3] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.
Long to wide
Converting data from long format back to wide format.
# Create example long data
long =
long
> Key:
> a b c d
>
> 1: 1 1 x -0.8425213
> 2: 1 1 y -0.0431396
> 3: 1 2 x -0.4273492
> 4: 1 2 y -0.3644490
> 5: 2 1 x -0.4764893
> 6: 2 1 y 0.2829335
> 7: 2 2 x 0.1844807
> 8: 2 2 y -0.5859007
> Key:
> a b x y
>
> 1: 1 1 -0.8425213 -0.0431396
> 2: 1 2 -0.4273492 -0.3644490
> 3: 2 1 -0.4764893 0.2829335
> 4: 2 2 0.1844807 -0.5859007
Split rows
Separating data into groups based on a factor.
> $A
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 4 0.5 A
> 3: 1 7 0.5 A
>
> $B
> V1 V2 V3 V4
>
> 1: 2 2 1 B
> 2: 1 5 1 B
> 3: 2 8 1 B
>
> $C
> V1 V2 V3 V4
>
> 1: 1 3 1.5 C
> 2: 2 6 1.5 C
> 3: 1 9 1.5 C
Split columns
Separating a single column into multiple columns based on a delimiter.
# Create example data
tmp =
tmp
> a w z
>
> 1: A:a A a
> 2: B:b B b
> 3: C:c C c
tmp
> a w z
>
> 1: A:a A a
> 2: B:b B b
> 3: C:c C c
Bind rows
Combining multiple datasets by rows. To begin, we create example data.
x =
y =
z =
# Simple row bind
> V1
>
> 1: 1
> 2: 2
> 3: 3
> 4: 4
> 5: 5
> 6: 6
# Simple row bind
> V1
>
> 1: 1
> 2: 2
> 3: 3
> 4: 4
> 5: 5
> 6: 6
> 7: 1
> 8: 2
> 9: 3
# Row bind with different columns
> V1 V2
>
> 1: 1 NA
> 2: 2 NA
> 3: 3 NA
> 4: 7 0
> 5: 8 0
> 6: 9 0
# Bind list of data.tables with ID column
> .id V1
>
> 1: 1 1
> 2: 1 2
> 3: 1 3
> 4: 2 4
> 5: 2 5
> 6: 2 6
Bind columns
# Column bind
base::
> V1 V1
>
> 1: 1 4
> 2: 2 5
> 3: 3 6
Set Operations
Set operations treat datasets as sets and perform mathematical set operations on them.
x <-
y <-
# Intersection (rows present in both x and y)
> V1
>
> 1: 2
> 2: 3
# Set difference (rows in x but not in y)
> V1
>
> 1: 1
# Include duplicates
> V1
>
> 1: 1
> 2: 3
# Union (combine unique rows)
> V1
>
> 1: 1
> 2: 2
> 3: 3
> 4: 4
# Include duplicates
> V1
>
> 1: 1
> 2: 2
> 3: 2
> 4: 3
> 5: 3
> 6: 2
> 7: 2
> 8: 3
> 9: 4
> 10: 4
# Test for equality
> [1] TRUE
> [1] TRUE
Read and write
# Write to CSV
# Write tab-delimited
# Read CSV or tab-delimited
DT1 =
DT2 =
# Read specific columns
DT3 =
DT4 =
# Combine multiple files
> V1 V2 V3 V4
>
> 1: 1 1 0.5 A
> 2: 2 2 1.0 B
> 3: 1 3 1.5 C
> 4: 2 4 0.5 A
> 5: 1 5 1.0 B
> 6: 2 6 1.5 C
> 7: 1 7 0.5 A
> 8: 2 8 1.0 B
> 9: 1 9 1.5 C
> 10: 1 1 0.5 A
> 11: 2 2 1.0 B
> 12: 1 3 1.5 C
> 13: 2 4 0.5 A
> 14: 1 5 1.0 B
> 15: 2 6 1.5 C
> 16: 1 7 0.5 A
> 17: 2 8 1.0 B
> 18: 1 9 1.5 C
Clean up local files.
> [1] TRUE TRUE TRUE TRUE TRUE TRUE
Loading source...