In this session, we will see how to use R’s vectorised operations to perform calculations more efficiently, and introduce functions in R and how to write them.

1 Vectorized operations

1.1 Introduction

  • We often need to process sets of values rather than single values. Such sets can be stored inside a vector. In many programming languages, operations on vectors are typically performed element by element in a loop:
for (i=0; i < length; i++) {
  z[i] = x[i] + y[i];
}
  • In interpreted languages such as R, this would be computationally inefficient, as the code within the loop is interpreted again for each iteration.
    However, interpreted languages often provide common vectorised operations. A vectorised operation is simply an operation that is done on an entire vector rather than on single values, removing the need for interpretation of code at each iteration. For example, the loop above would be replaced by:
z <- x + y
  • Vectorised operations are typically much faster to compute than looping single-value operations, and also easier to read. Consequently, vectorised operations are essential for efficient R programming.

  • If you feel the impulse of writing a loop, think twice before doing it! Indeed, R is particularly geared towards vectorised operations.

    • On the one hand, most things in R are already vectors (or composed of vectors):

      • a single number is a (length-one) vector;
      • matrices are internally represented as vectors.
    • On the other hand, R also includes default behaviours to help work with vectors:

      • if vectors in an operation are not all of the same length, the shorter vector is recycled (repeated) as needed.
      • many basic operations (adding, multiplying, calculating means, …) are vectorised.
  • NB: matrices (arrays) are often not recycled with the same rules, and often must have the same dimensions, without which warnings or errors are thrown. See also: R documentation about recycling

1.2 Simple mathematical operations

1.2.1 Vector and single number

  • add, subtract, divide, multiply, modulo …
  • in other programming languages (e.g. C, Perl, Python…), we would use a ‘for’ loop in order to modify each element of the vector x:

Python code:

x = [0,1,2,3,4,5]
x = [item+1 for item in x]
x
## [1, 2, 3, 4, 5, 6]

However, in R code we can write this:

x <- 0:5
x + 1
## [1] 1 2 3 4 5 6
  • How does this work?
  1. The expression x + 1 is the addition of two vectors of lengths 6 and 1. For this to work, the vectors need to have the same length and R recycles (repeats) the shorter vector (1) to match the longer vector (x).
  2. Now that both vectors are of the same length, additions are performed element-wise.
  3. Print result to screen.

The shorter vector is always recycled to match the longer one.

1.2.2 Two vectors

  • Similarly, we can add elements of two vectors:
y <- 5:0
z <- x + y
z
## [1] 5 5 5 5 5 5
  • \(z_i\), the element at index \(i\) of the result \(z\), is the sum of the elements \(x_i\) and \(y_i\): \(z_i = x_i + y_i\)
  • Remember: R performs this operation element-wise

Can you guess and explain the results in the following cases?

  • Exercise 1
r <- c(1, 1, 1, 1)
s <- c(1, 3)
r + s
## [1] 2 4 2 4

=> R recycles the shorter vector along the long one and perform the addition element-wise. The expression c(1,1,1,1) + c(1,3) is equivalent to c(1,1,1,1) + c(1,3,1,3)

  • Exercise 2
v <- c(1, 2, 3)
r + v
## Warning in r + v: longer object length is not a multiple of shorter object length
## [1] 2 3 4 2

R still recycles the shorter vector, but in this case the two vectors do not have compatible lengths, and R throws a warning.

  • Exercise 3
r + 5L
## [1] 6 6 6 6

=> This is the addition of a numerical vector (r) to an integer number (5L). In this case, R converts the integer vector also to a numerical vector and recycles it as above. Compare class(r) and class(5L) to class(r + 5L).

1.2.3 Matrix and single number

  • In R, a matrix is internally represented as a vector.
    • Values can be looked up using 2-dimensional coordinates (rows, columns)
    • Values can be looked up using the underlying 1-dimensional index.
    • The length of a matrix M (length(M)) is the product of its dimensions (prod(dim(M))).
  • For example, a 2-by-3 matrix has length 6:
m <- matrix(11:16, nrow = 2, ncol = 3)
m  # matrix printed as a matrix
##      [,1] [,2] [,3]
## [1,]   11   13   15
## [2,]   12   14   16
m[2, 2]  # extracting a single value from a matrix with 2D coordinates
## [1] 14
m[4]  # extracting the same value from the matrix using the internal vector's index
## [1] 14
m[1:length(m)]  # extracting the underlying vector
## [1] 11 12 13 14 15 16

Note: notice that (by default) matrices are filled column by column!

  • When performing vector operations on matrices, the shorter vectors are recycled to that length. Hence, in the example below, R recycles a single number to match the dimensions of a matrix, and again performs calculations element-wise.
m1 <- matrix(1:4, nrow = 2, ncol = 2)
m1
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
# add 1
m1 + 1
##      [,1] [,2]
## [1,]    2    4
## [2,]    3    5

1.2.4 Matrix and vector

Examples:

m1 + c(1:2)
##      [,1] [,2]
## [1,]    2    4
## [2,]    4    6
m1 + c(1:4)
##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8

Can you guess what would happen by evaluating this statement?

  • Exercise 4
m1 + c(1:3)
## Warning in m1 + c(1:3): longer object length is not a multiple of shorter object length
##      [,1] [,2]
## [1,]    2    6
## [2,]    4    5

=> R again recycles the shorter vector and performs the addition element-wise, with a warning.

Interestingly, if the vector were longer, R throws an error about incompatible dimensions (see also R documentation about recycling):

m1 + 1:8
Error: dims [product 4] do not match the length of object [8]

1.2.5 Two matrices

  • As before, all simple mathematical operations are applied element-wise.
  • Important: The expression m1 * m2 is not to be confused with matrix multiplication
    (see below how to do that in R).
m2 <- matrix(5:8, nrow = 2, ncol = 2)
m1 + m2  # element-wise addition
##      [,1] [,2]
## [1,]    6   10
## [2,]    8   12

Can you guess what would happen in this case?

  • Exercise 5
m3 <- matrix(1:6, nrow = 3, ncol = 2)
m1 + m3
Error in m1 + m3 : non-conformable arrays

=> In operations with more than one matrix or array, all the matrices (arrays) must have the same dim attribute (see also R documentation about recycling). R does not automatically recycle non-conforming matrices and throws an error about incompatible dimensions.

1.3 Linear algebra

1.3.1 Simple multiplication

  • Simple multiplication ("*") is element-wise multiplication (similar to addition).
  • Matrices need to have identical dimensions.
m1 * m2
##      [,1] [,2]
## [1,]    5   21
## [2,]   12   32

1.3.2 Matrix product of two matrices

  • If we want to calculate the matrix-product of two matrices, we need to use the %*% operator instead of the * operator.
  • In this case, the calculation is based on the formula below, illustrated
    in the figure.

\[ C_{ij} = \sum_{k=1}^n A_{ik}*B_{kj} \]

NB: It is also obvious that the number of columns in first matrix must match the number of rows in second matrix.

Example:

m1 %*% m2
##      [,1] [,2]
## [1,]   23   31
## [2,]   34   46
  • Matrix multiplication is used in many statistical approaches (for example linear models).

1.3.3 Matrix product of one matrix and a vector

  • We can also calculate the product of matrix and a vector. The vector’s length must match one of the matrices dimensions, depending on which side of the operation the vector is.
    • Left: row vector of length equal to height of matrix
    • Right: column vector of length equal to width of matrix
m3 <- matrix(1:6, nrow = 3, ncol = 2)
m3
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
c(1, 2, 3) %*% m3
##      [,1] [,2]
## [1,]   14   32
m3 %*% c(2, 2)
##      [,1]
## [1,]   10
## [2,]   14
## [3,]   18

1.4 More exercises

  • Exercise 6
    Sum all odd numbers between 1 and 1000, using vectorized arithmetic.
# with logical operator
x <- 1:1000
sum(x[c(TRUE, FALSE)])
## [1] 250000
# or with seq
sum(seq(1, 1000, 2))
## [1] 250000
  • Exercise 7
    Convert the following set of temperatures measured in degrees of Celsius to Fahrenheit, -20°C, 0°C, 100°C, using vectorized arithmetic.

Hint: \((°C * 9/5) + 32 = °F\).

x <- c(-20, 0, 100)
(x * 9/5) + 32
## [1]  -4  32 212
  • Exercise 8
    Given a matrix m m <- matrix(1:20, nrow=4, ncol=5), what are the results of:
  1. m ^ 0.5
  2. m * c(0,1,2,3)
  3. m > c(100, 0)
  4. m * m
  5. m %*% m
m <- matrix(1:20, nrow = 4, ncol = 5)
# a)
m^0.5  # or sqrt(m)
##          [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 1.000000 2.236068 3.000000 3.605551 4.123106
## [2,] 1.414214 2.449490 3.162278 3.741657 4.242641
## [3,] 1.732051 2.645751 3.316625 3.872983 4.358899
## [4,] 2.000000 2.828427 3.464102 4.000000 4.472136
# b)
m * c(0, 1, 2, 3)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    2    6   10   14   18
## [3,]    6   14   22   30   38
## [4,]   12   24   36   48   60
# c)
m > c(100, 0)
##       [,1]  [,2]  [,3]  [,4]  [,5]
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,]  TRUE  TRUE  TRUE  TRUE  TRUE
## [3,] FALSE FALSE FALSE FALSE FALSE
## [4,]  TRUE  TRUE  TRUE  TRUE  TRUE
# d)
m * m  # or m^2
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   25   81  169  289
## [2,]    4   36  100  196  324
## [3,]    9   49  121  225  361
## [4,]   16   64  144  256  400
# e) m %*% m # returns error: non-conformable arguments, we need to transpose
# one of the matrices
t(m) %*% m
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   30   70  110  150  190
## [2,]   70  174  278  382  486
## [3,]  110  278  446  614  782
## [4,]  150  382  614  846 1078
## [5,]  190  486  782 1078 1374
  • Exercise 9
    Calculate the inverse sum of squares: \(x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)\) for values:
  1. 1:3
  2. 1:100
# a)
1/1 + 1/(2^2) + 1/(3^2)  # non-vectorised
## [1] 1.361111
# b)
x <- 1:100
sum(1/(x^2))  # vectorised
## [1] 1.634984

2 Functions

2.1 Introduction

  • In R, functions are pieces of R code that are kept inside R objects of class "function". They can take input via other R objects passed as arguments (e.g. some integers, vectors, character strings…), process that input, and then return some output in the form of another R object or something else (e.g. a matrix, a figure, some text, a file…).

  • Many functions in R are written themselves in R, and also call other R functions. R can show you the code inside a function.
    To view the code implementation of a function, type the name of the function without the parentheses “(” and “)”. Here is an example:

# R's implementation for set intersection
intersect
## function (x, y) 
## {
##     if (is.null(x) || is.null(y)) 
##         return(NULL)
##     u <- as.vector(x)
##     v <- as.vector(y)
##     c(u[!duplicated(unclass(u)) & (match(u, v, 0L) > 0L)], v[numeric()])
## }
## <bytecode: 0x7f9a9d9930a8>
## <environment: namespace:base>
# R's implementation for standard deviation (a one-line function)
sd
## function (x, na.rm = FALSE) 
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <bytecode: 0x7f9abc79ee80>
## <environment: namespace:stats>
  • A great many functions are provided in base R, and many more can be added to your session by loading packages.

  • Finally, it is easy in R to create and work with user-defined functions, which is what we will explore here. Creating and using your own functions helps keep your code tidy and readable, avoids copy-pasting code multiple times, and also helps with debugging, especially within an IDE such as RStudio.

2.2 Function structure

2.2.1 Code structure

  • You create your own functions using function(...).
    The basic code structure of a function is given below.
    myFunction <- function(arg1, arg2, ... ){
      statements
      return(object)
    }

2.2.2 Naming convention

  • Functions are typically named (myFunction in the example above). Naming rules for functions are identical to rules for other R objects (names should start with a letter, should be descriptive, etc.).
  • Some functions have special names (e.g. the addition function +) and can be called using `+` (function name enclosed in backticks).
  • In special cases, we can use anonymous functions (without names, which are also called “lambda functions” in other programming languages), which we will see in the next session.

2.2.3 Internal parts of a function object

  • Internally, a function has three parts, which can be viewed using the following functions:

    • the formals() (or args()): the list of arguments which define how you call the function
    • the body(): the statements/code executed by the function
    • the environment(): the “map” of the location of the function’s variables.

Let’s consider a concrete example:

# R's implementation of standard deviation
formals(sd)
## $x
## 
## 
## $na.rm
## [1] FALSE
args(sd)  # more human-readable version of formals()
## function (x, na.rm = FALSE) 
## NULL
body(sd)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
environment(sd)
## <environment: namespace:stats>
sd
## function (x, na.rm = FALSE) 
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <bytecode: 0x7f9abc79ee80>
## <environment: namespace:stats>

2.2.4 A note on primitive functions

  • Exceptions to the structure presented above are “primitive” (internally implemented) functions (e.g. sum, “+”, “[”, sin, cos, etc.). For efficiency, they directly call internal code by using the .Primitive function, and do not have these three components.
# R's implementation of sum
formals(sum)
## NULL
body(sum)
## NULL
environment(sum)
## NULL
sum
## function (..., na.rm = FALSE)  .Primitive("sum")
  • Operators (“+”, “<-”, “=”, etc.) are in fact special functions that take one or two arguments, and that can be written without parantheses.

2.2.5 A note on generic functions

  • Some functions are known as “generics” (e.g. mean, summary, print) and work by calling specific methods depending on the class of the arguments provided. Covering these is beyond the scope of this session, however, here is how to recognise one (beyond it being mentioned in the help page!):
# R's generic implementation of mean
body(mean)
## UseMethod("mean")
# => body is made up entirely of a call to a method dispatcher.

2.3 Calling functions

  • To call a function is simply to run that function’s code on some input arguments. We have already done this many times in this course:
function name + round parentheses + arguments inside parentheses

For example:

formals(sd)  # view the arguments of the sd function
## $x
## 
## 
## $na.rm
## [1] FALSE
x <- rnorm(1000)  # randomly draw a vector of size 1000 from a normal distribution
sd(x)  # calculate the sd of the previous vector
## [1] 1.018551

2.3.1 A note on invisible returns

  • Most functions return their results in a way that allows them to be printed to the standard output, like here. Some functions, typically those designed to return very large results, or those that generate other side-effects (e.g. creating a plot, a file…) might hide the result using the invisible() function on their output.
# quick example
makeBigMatrix <- function() {
    invisible(matrix(rnorm(1000 * 1000), nrow = 1000, ncol = 1000))
}
makeBigMatrix()  # matrix returned is invisible: no printing to screen
m <- makeBigMatrix()
m[1:5, 1:5]  # can still retrieve and work with the returned matrix
##             [,1]        [,2]       [,3]       [,4]         [,5]
## [1,]  0.05303895  1.33551256  1.0401955 -0.6881449 -0.333609352
## [2,] -0.86962082 -0.01549526 -1.1265505  0.2115803  1.368839533
## [3,] -0.36781456  0.47784175  1.6702737 -0.4286871  0.829452797
## [4,]  0.48709974 -0.69124100  0.6730936  1.2433947  1.587073860
## [5,] -0.54262259 -0.97226524 -0.4663152 -1.3929860 -0.007958047
  • Note: if you do not explicitly specify a return() statement, the function will return what was returned by its last executed statement.
myadd <- function(a, b) {
    a + b
}
myadd(3, 2)
## [1] 5
# => still returns a value

2.3.2 Arguments

  • The arguments can be defined in two ways:

    • mandatory arguments, without a default value (in the case of sd() the x)
    • optional arguments, with a default value (in the case of sd(), the na.rm argument has a default value of FALSE)
  • When calling a function, the values in the call are matched to arguments either by name or by position.

For example, the following call does probably not what you intended (and depending on your R version may even throw an error):

# x <- rnorm(1000) # reminder
sd(TRUE, x)  # incorrect, expects first `x` and second `na.rm`

In the above call, arguments were matched by position, equivalent to sd(x = TRUE, na.rm = x).

You can pass arguments in an arbitrary order by using their names, in which case R will first match all named arguments, and then the remaining unnamed ones by position:

# ... which can make the code less readable
sd(na.rm = TRUE, x = x)
## [1] 1.018551
sd(na.rm = TRUE, x)
## [1] 1.018551
sd(TRUE, x = x)
## [1] 1.018551
  • Exercise 10
    How are the arguments matched in the calls below?
y <- c(x, NA)
sd(y)
## [1] NA
sd(y, na.rm = TRUE)
## [1] 1.018551
sd(y, TRUE)
## [1] 1.018551
sd(TRUE, x = y)
## [1] 1.018551
  • sd(y): by position (using the default value of na.rm = FALSE)
  • sd(y, na.rm = TRUE): by name (na.rm) and by position (y)
  • sd(y, TRUE): by position
  • sd(TRUE, x = y): by name (y) and by position (TRUE)
  • Exercise 11
    Write a function that calculates the (element-wise) addition of two vectors a and b.
add <- function(a, b) {
    a + b
}
add(1:3, 1:3)
## [1] 2 4 6
  • Exercise 12
    Write a function that calculates the (element-wise) addition of two vectors a and b, but that can also be called with just a. In that case, it will calculate the addition of a with itself.
# using first argument as default value
add2 <- function(a, b = a) {
    return(a + b)
}
add2(1:3, 1:3)
## [1] 2 4 6
add2(1:3)
## [1] 2 4 6
# using NULL as default value
add3 <- function(a, b = NULL) {
    if (is.null(b)) {
        b = a
    }
    return(a + b)
}
add3(2:4, 2:4)
## [1] 4 6 8
add3(2:4)
## [1] 4 6 8
  • Exercise 13
    Write a function that takes to arguments x and n and multiplies the elements of x with n. Check that x is a numerical vector and n is a single numerical value, and throw an error if this is not the case (hint: see ?stop or ?stopifnot for information of how to throw an error).
# using if
multn <- function(x, n) {
    if (!is.numeric(x) || !is.numeric(n) || length(n) != 1L) {
        stop("Error in arguments")
    }
    return(x * n)
}
multn(1:3, 2)
## [1] 2 4 6
# multn(1:3, c(2,2))
# using stopifnot
multn2 <- function(x, n) {
    stopifnot(exprs = {
        is.numeric(x)
        is.numeric(n)
        length(n) == 1L
    })
    return(x * n)
}
multn2(2:4, 2)
## [1] 4 6 8
# multn2(1:3, c(2,2))

2.3.3 Dot-dot-dot operator ...

  • A function can only handle as many arguments as specified in formals, unless the function uses the special ... argument (e.g. the plot function).

  • The special argument ... (dot-dot-dot, or ellipsis) can accept any number of named/unnamed values. Here is an illustrative example:

count_args <- function(...) {
    arglist <- list(...)
    message("you called me using ", length(arglist), " arguments")
}
count_args()
## you called me using 0 arguments
count_args(1:10)
## you called me using 1 arguments
count_args(1, 2, 3, 4, 5)
## you called me using 5 arguments
  • The ellipsis argument will be assigned all arguments that could not be name- nor position-matched.

  • Exercise 14
    How many arguments are “eaten up” by the ellipsis (...) in the following examples:

count_args(1:2)
count_args(1, 2)
count_args(1:2, 3:4)
count_args(1:4)
  • count_args(1:2): 1 (a single vector with two elements)
  • count_args(1, 2): 2 (two length-one vectors)
  • count_args(1:2, 3:4): 2 (two length-two vectors)
  • count_args(1:4): 1 (a single vector with four elements)

2.3.4 Note on calling a function programmatically with do.call

  • We have seen above that the arguments of a function (e.g. returned by formals()) are, in fact, a list object, and that functions with an ellipsis argument can handle any number of arguments. What can we do when we don’t know in advance how many arguments we will want to pass to such a function?

  • It is possible to create an argument list as an R object, and then execute the function call on it using do.call; the first argument to do.call is the function to be called, and the second is the list with the arguments. For example:

inputList <- list(y, na.rm = TRUE)
do.call(sd, inputList)
## [1] 1.018551
  • This mechanism can be useful in an R program where the exact number of arguments may not be known in advance:
r <- list(row1 = 1:3, row2 = 4:6)
# assuming we don't know how many elements there are in r
do.call(rbind, r)
##      [,1] [,2] [,3]
## row1    1    2    3
## row2    4    5    6

2.4 Environments and variable scoping

  • An environment is a place in R’s memory where R objects can be stored, including values, datasets, functions, other environments, and more.

  • By default, when working from the command line, you are working within the global environment.

  • The scope of a variable defines where a variable “lives”. It is the region of code in which a variable is visible and can be used in expressions.

  • The arguments of a function have a scope that is limited to the body of that function, e.g. myarg does not exist in the global environment, but it does exist inside of the environment of myfunction(myarg = 1):

# no myarg in the current environment
exists("myarg")
## [1] FALSE
# but myarg exists inside of myfunction
myfunction <- function(myarg = 1) {
    return(exists("myarg"))
}
myfunction()
## [1] TRUE
  • You can however use variables inside a function that “live” outside, i.e. in the environment from which the function was called (e.g. a parent function, or the global environment). This works due to R’s variable lookup rules, described in more detail below.

2.4.1 Where to look-up for variables and functions (lexical lookup)

  • By default, R first tries to find variables or functions in the environment of the function itself (remember that each function gets its own environment, one of the three parts of a function).
  • If it cannot find an object there, it tries to find the object in the parent environment, from which the function was called.
  • This is continued until it reaches the global environment.
  • If no object is found with that name by then, R throws an error.

Here is an illustration:

f <- function() {
    x <- 1
    y <- 2
    c(x, y)
}
f()
## [1] 1 2
rm(f)  # clean-up

If a variable is not defined inside a function, R will look one level up, in its parent environment:

x <- 2
g <- function() {
    y <- 1
    c(x, y)
}
g()
## [1] 2 1
rm(x, g)  # clean-up
  • This behavior might be problematic if there are already many variables in an R session, and a function might find one that by chance has a matching name. In most cases, this is not a desired behavior, as it may lead to different results depending on the current state of the global environment (i.e. the value of the variable). It is good practice to always test your code and functions inside a fresh R session. This is particularly important if you develop code in the global environment first, before moving it to an encapsulating function, as the objects you created will remain in the global environment.

2.4.2 When to look-up for variables / functions (dynamic look-up)

  • If the function refers to variable defined outside of its body, it uses the value at the time of the function call, not at the time when we define the function (the lookup happens during the function call).
x <- 2
g <- function() {
    y <- 1
    c(x, y)
}
x <- 3
# g() uses the value of `x` at the time of the call to g()
g()
## [1] 3 1
rm(x, g)

2.4.3 Same name – function vs. variable

  • Be careful when choosing the names for the objects and functions. If an object with the same name already exists in an environment, it will be overwritten with a new value. The only exception is for functions defined in base R or in packages, which are just “masked” by the newly created object, but can be unmasked again when it is removed.
  • In addition, R recognizes when you intend to call a function (round brackets ()) and skips non-function objects when searching:
sum  # the function
## function (..., na.rm = FALSE)  .Primitive("sum")
sum <- 15
sum  # the variable, not the function
## [1] 15
sum(c(1, 2, 3))  # still works
## [1] 6
sum
## [1] 15
help(sum)  # still works
rm(sum)  # removing the variable...
sum  # ...unmasks the function
## function (..., na.rm = FALSE)  .Primitive("sum")

2.4.4 A fresh start with every function call

With every function call, the environment of the function is freshly created, which means that functions have no automatic memory.

The function below checks if a exists. If not, it creates a with a value of zero, otherwise it adds 1 to the already existing a.

Can you guess what would the output of two subsequent calls of fresh function?

fresh <- function() {
    if (!exists("a")) {
        a <- 0
    } else {
        a <- a + 1
    }
    a
}
exists("a")  # check it does not exist globally
## [1] FALSE
fresh()
## [1] 0
fresh()
## [1] 0

As the variable a does not exist at the time we call the function (neither in the function itself nor in the global space), the fresh function returns 0 in both calls. a temporarily lives in the environment of the function fresh, but every time fresh is called, it will get a newly created environment.

For more details see Hadley Wickham’s book Advanced R

2.5 More exercises

  • Exercise 15a
    What is the output of sum(1,2,3,4,5) ?
sum(1, 2, 3, 4, 5)
## [1] 15
  • Exercise 15b
    What is the output of mean(1,2,3,4,5) ?
mean(1, 2, 3, 4, 5)
## [1] 1
  • Exercise 15c
    What is the reason for the observations from the two previous exercises?
# check the help page of both functions
`?`(sum)
`?`(mean)

=> We can see that sum uses the dot-dot-dot ... argument, which means that all arguments passed without a specific name are just summed-up together. Even if we provide logical value! The only way how to tell that the logical is a na.rm argument is to directly name it.

sum(1, 2, 3, 4, 5, TRUE)
## [1] 16
sum(1, 2, 3, 4, 5, na.rm = TRUE)
## [1] 15

=> The function mean on the other hand calculates the mean only from first argument. The others are passed to internal functions which do the calculation. It also accepts a na.rm argument. It can also accept trim argument for calculation of a trimmed mean.

  • Exercise 15d
    How can we achieve the desired result?

Provide the numbers in a single vector.

sum(c(1, 2, 3, 4, 5))
## [1] 15
mean(c(1, 2, 3, 4, 5))
## [1] 3
  • Exercise 16
    Write a function that converts temperatures in degrees Fahrenheit to degrees Celsius. C.F. previous exercise on the subject.
C_to_F <- function(x) {
    (x * 9/5) + 32
}
C_to_F(c(-20, 0, 100))
## [1]  -4  32 212
  • Exercise 17
    Can you predict what the code below does? Can you explain the resulting behaviour?
gb_val <- 10
edit_global_val <- function(x) {
    gb_val <- gb_val + x
    return(gb_val)
}
# apply the edit...
edit_global_val(5)
# ...but did the global value actually change?
gb_val
## [1] 15
## [1] 10

=> The function returns the edited global value… But the actual global value has not changed! This is because the moment we assigned a value to an object called gb_val inside the function using gb_val <- gb_val + x, we actually created a copy of gb_val that is limited to the scope of the function. As such, the assignment only changes the local copy, not the global one.

Is there a way around this? Beyond the scope of this session, but…

gb_val <- 10
edit_global_val <- function(x) {
    # force the assignment to occur within the global environment
    assign("gb_val", gb_val + x, envir = parent.env(environment()))
    return(gb_val)
}
# apply the edit...
edit_global_val(5)
## [1] 15
# ...but did the global value actually change?
gb_val
## [1] 15

Loose rule: avoid editing values in other environments from within functions.