# Learn Tidy Evaluation from Daniel Chen  ## Peng Chen

March 2, 2021

Below is a summary made from Daniel Chen’s recent talk “Learning Tidy Evaluation by Reimplementing dplyr”. The goal is to re-implement the dplyr::select() function and will be achieved through four attempts.

To make these notes as concise as possible, we load

``````library(tidyverse)
library(rlang)``````

and will use the iris dataset for testing.

``iris %>% head()``
``````##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa``````

# Attempt 1

## Function definition

``````select_1 <- function(data, col) {
col_position <- match(as.character(col), names(data))
data[, col_position, drop = FALSE]
}``````

To understand match(), see

``match(c("c", "a"), c("a", "b", "c"))``
``##  3 1``

This is useful for selecting multiple columns because it preserve the selection order. Thus, it is better than the which() function

``which(c("a", "b", "c") %in% c("c", "a"))``
``##  1 3``

## Test 1

``select_1(iris, "Species") %>% head(3)``
``````##   Species
## 1  setosa
## 2  setosa
## 3  setosa``````

## Test 2

``select_1(iris, Species) %>% head(3)``
``## Error in select_1(iris, Species): object 'Species' not found``

The second test fails because variable Species is never defined in the function. In other words

``as.character(Species)``

can not be eveluated, unlike

``as.character("Species")``
``##  "Species"``

# Attempt 2

The solution is to capture your code as an expression (without evaluating it), which can be manipulated later, such as being converted to a string.

## Function definition

``````select_2 <- function(data, col) {
col <- enexpr(col)
col_position <- match(as.character(col), names(data))
data[, col_position, drop = FALSE]
}``````

## Test

``select_2(iris, Species) %>% head(3)``
``````##   Species
## 1  setosa
## 2  setosa
## 3  setosa``````

It works because the following code can be evaluated.

``as.character(expr(Species))``
``##  "Species"``

Next, we can generalize the function to select mutiple columns using dot-dot-dot.

# Attempt 3

## Function definition

``````select_3 <- function(data, ...) {
cols <- rlang::enexprs(...)
cols_char <- as.vector(cols, mode = "character")
cols_positions <- match(cols_char, names(data))
data[, cols_positions, drop = FALSE]
}``````

## Test 1

``select_3(iris, Species, Sepal.Width, Petal.Length) %>% head(3)``
``````##   Species Sepal.Width Petal.Length
## 1  setosa         3.5          1.4
## 2  setosa         3.0          1.4
## 3  setosa         3.2          1.3``````

## Test 2

``````col_name <- "Species"

select_3(iris, col_name, Sepal.Width, Petal.Length) %>% head(3)``````
``## Error in `[.data.frame`(data, , cols_positions, drop = FALSE): undefined columns selected``

This time, the second test fails because

``````cols <- exprs(col_name, Sepal.Width, Petal.Length)
cols``````
``````## []
## col_name
##
## []
## Sepal.Width
##
## []
## Petal.Length``````
``as.vector(cols, mode = "character")``
``##  "col_name"     "Sepal.Width"  "Petal.Length"``

Code col_name is captured as expression col_name, and it has nothing to do with what you want, the “Species” string.

Even if you evaluate expression col_name, it will only work in the global environment because variable col_name is not defined in the function environment. Therefore, the solution is to capture the dot-dot-dot as quosures (expressions + their environments) and then evaluate the quosures.

# Attempt 4

## Function definition

``````select_4 <- function(data, ...) {
cols <- enquos(...)
vars <- set_names(seq_along(data), names(data)) %>% as.list()
col_char_num <- map(cols, eval_tidy, vars)
cols_positions <- map_int(
col_char_num,
function(x) ifelse(is.character(x), vars[[x]], x)
)
data[, cols_positions, drop = FALSE]
}``````

There are quite a few changes in the new function. But they are easy to understand by checking the test below.

## Test

``select_4(iris, col_name, Sepal.Length, "Petal.Width") %>% head(3)``
``````##   Species Sepal.Length Petal.Width
## 1  setosa          5.1         0.2
## 2  setosa          4.9         0.2
## 3  setosa          4.7         0.2``````

Here we have it, a pretty robust re-implementation of the dplyr::select() function.

To simulate what is happening inside the function, see

``(cols <- quos(col_name, Sepal.Length, "Petal.Width"))``
``````## <list_of<quosure>>
##
## []
## <quosure>
## expr: ^col_name
## env:  global
##
## []
## <quosure>
## expr: ^Sepal.Length
## env:  global
##
## []
## <quosure>
## expr: ^"Petal.Width"
## env:  empty``````

Notice each expression is captured together with its environment.

``````data <- iris
(vars <- set_names(seq_along(data), names(data)) %>% as.list())``````
``````## \$Sepal.Length
##  1
##
## \$Sepal.Width
##  2
##
## \$Petal.Length
##  3
##
## \$Petal.Width
##  4
##
## \$Species
##  5``````
``(col_char_num <- map(cols, eval_tidy, data = vars))``
``````## []
##  "Species"
##
## []
##  1
##
## []
##  "Petal.Width"``````

Function eval_tidy() can be used to evaluate a quosure (an expression bundled with an environment), which also takes an additional argument, data. If data is supplied, objects in the data mask always have precedence over the quosure environment, i.e. the data masks the environment. When eval_tidy() is applied to quo(col_name), it first searches the name col_name inside list vars, then finds no match, and then evaluates the quosure according to the quosure environment, where col_name = “Species”.

``eval_tidy(quo(col_name), data = vars)``
``##  "Species"``

When eval_tidy() is applied to quo(Sepal.Length), it first searches the name Sepal.Length inside list vars, then finds a match with value 1.

``eval_tidy(quo(Sepal.Length), data = vars)``
``##  1``

Lastly, the string value “Petal.Width” is always evaluated as itself.

``eval_tidy(quo("Petal.Width"), data = vars)``
``##  "Petal.Width"``

Based on the elements in col_char_num, it is not difficult to understand how the following code finds the correct column positions.

``````(
cols_positions <- map_int(
col_char_num,
function(x) ifelse(is.character(x), vars[[x]], x)
)
)``````
``````##
## 5 1 4``````

# Five big ideas of tidy evaluation

This summary omits lots of details in tidy evaluation. I strongly recommend Hadley’s 5 big ideas of tidy evaluation video if you are new to these ideas, which is only 5-minute long. Below are the big five 😀

1. R code is a tree

2. Capture the tree by quoting

3. Unquoting makes it easy to build trees

4. Quote + unquote

5. Quosures capture expression & environment