R is free, open-source, and runs on all platforms
Very comprehensive and versatile
High quality plotting and data visualization
Most widely used language for data science, with python
.
Large community, increasing over the years: https://www.techrepublic.com/article/r-programming-language-continues-to-grow-in-popularity/, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010372#sec009
More than 20,000 packages on the R central repository (CRAN, https://cran.r-project.org/web/packages/ ).
For biologists: 2,300 packages dedicated to analysis of genomics data (the Bioconductor project, http://bioconductor.org/ ).
Rstudio is an Integrated development environment (IDE). It is probably the most ergonomic option for learning and using R nowadays.
Get familiar with Rstudio main panes and their role: Console, Help, Files, Plots, Environment, History
The “root”: on linux or mac this is represented by a /
paper.doc
?
/projectX/paper/paper.doc
CONF.txt
?
/projectX/analysis/data/CONF.txt
paper
directory, what is the relative path to the file CONF.txt
?
../analysis/data/CONF.txt
paper
directory, what is the relative path to the file paper.doc
?
./paper.doc
getwd()
## [1] "/Users/rouxj/work/Teaching/introductionToR"
setwd("/Users/rouxj/work/Teaching/introductionToR/01-introduction/")
setwd("01-introduction/")
setwd("../introductionToR/01-introduction/")
The first option because it works wherever your are currently located on the computer. It might however be problematic to share your code with someone since the path on a different computer will likely be different. In this case, a relative path is safer.
You can store the values or results in variables for later use. There are two ways of assigning a value to a variable:
=
single equal sign
<-
arrow pointing to the left
x <- 10
y = 10
## Display the value of a variable
x
y
## A syntactically valid name consists of letters, numbers and the . or _ characters
x1 <- 10
x_1 <- 10
x.1 <- 10
## But no numeric at beginning of variable name
1x <- 10 ## does not work
## Variables can be assigned something else than numbers
h <- "hello"
Look at the Environment pane on Rstudio. Do you see these new variables and their values?
NOTE: variables are most often referred to as “objects” in R, and in the course the two words are used as synonyms (for more information see https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects)
a <- 2
A <- 3
Commands are executed when you type a return character (Enter
). To launch multiple commands at once, separate them with a semicolon ;
a <- 2; A <- 3
They play no role as long as the code is not ambiguous. Aim at using spacing for better code readability.
b <- 2
b<-2 ## Same result, but less readable
b < - 2 ## This has a different meaning!
## [1] FALSE
Variables can be used for basic calculations using basic operators
1 * 3
a * 3
a * A
a / A
# Assign result to a new variable
b <- A + a
b
## Of course this has to make sense!
h
h + 1
## Beware: R sometimes performs weird operations between data types without any warning or error!
## testing equality with double equal sign
a == A
a == h
## testing inequality
a != A
R includes numerous built-in functions that can be used to ask R to perform a specific task using pre-written code. Most functions need some information/input in order to run, which are given inside the parentheses and are called “arguments”.
## other built-in functions
sqrt(a)
a^2
cos(1)
abs(-2)
## Some functions can take more than one argument
rep(1, 5) # repeat 1, 5 times
## It is better for clarity to call the argument with their name using "="
rep(x=1, times=5)
Whenever you encounter a new function, it is a good idea to check its help page
help("sqrt")
?sqrt ## Same thing
help("*")
?"*"
?`*`
Look at the different sections of the help pages. The Example
section is usually useful to understand the behavior of a function.
What is this doing?
?help
Can you list the arguments of the help function?
It shows the help page for the help()
function!
The arguments topic
, package
, lib.loc
, verbose
, try.all.packages
or help_type
can be used.
If the name of the function is not known, you can (usually) find it with:
help.search("chi square")
??"chi square" ## Same thing
Last resort: search on your favorite search engine!
This is a very useful feature, for long function or object names. You can type the beginning of the object or function name, and press the TAB
key.
## Typing the first few letters of a function
chis ## Press TAB: it's magic!
thisIsaLoooooooongAndWeirdObjectName <- 10
this ## Press TAB: it's magic!
## Typing the first few letters of a function (not unique to a single function)
## Beware, the Rstudio and R Console behaviors differ a bit!
help.
## - Rstudio will open a suggestion box: choose with the arrows up and down
## - On the console you can press TAB to see which functions start with "help."
## then add the few letters that discriminate a unique name,
## and press TAB again to autocomplete
help.se ## Press TAB: it autocompletes!
This saves a lot of time and typos! It is probably weird and not intuitive at first, and you might have to force yourselves to use it at the beginning, but this is worth it.
Rstudio now offers even smarter code completion features. For example, the following symbols are used to disambiguate the various completions:
The auto-completion is now by default “Always On” (this can be changed in the Global Options), for example the possible completions are displayed after a period of typing inactivity.
Create the following variable, and play a bit with auto-completion:
supercalifragilisticexpialidocious <- 1
Parentheses, braces or brackets need to match. If they do not match, the next line will start with a +
instead of the usual >
. Close the parenthesis on the new line, or cancel the command with Ctrl+c
, or ESC
.
(a + b)*2
(a +
b)*2
(a +
NULL
represents the null object in R: it is a reserved word. It is often returned when an expression or function results in an undefined value.
NA
stands for “Not Available” and is a logical constant of length 1 which contains a missing value indicator. Be careful, this is not equivalent to the quoted "NA"
which is a character string and has not special meaning in R.
The behavior of these 2 reserved words can be complex, and details are too advanced to discuss within this course. Some other reserved words that are maybe easier to understand are:
NaN
stands for “Not A Number” and is typically displayed when an invalid computation was conducted.
Inf
and -Inf
stand for infinity and negative infinity, and are often the result of a division by 0
pi / 0 ## A non-zero number divided by zero creates infinity
0 / 0 ## NaN
1/0 + 1/0 # Inf
1/0 - 1/0 # NaN
cos(Inf) # NaN
In addition to the built-in functions (the base
R functions), you can also use functions from packages developed by others
kde2d() ## Not found
library(MASS)
kde2d() ## Now we get an error message but the function was found :)
The packages sometimes need to be installed:
## Through CRAN
install.packages("RColorBrewer")
## Through Bioconductor, see http://www.bioconductor.org/install/
install.packages("BiocManager")
BiocManager::install("limma")
library(limma)
When some piece of code is needed multiple times, it is often a good idea to write your own function (we will see this later in the course)
ls() # list the objects in your workspace
rm(h) # remove object h from workspace
Or look at the Environment Pane in Rstudio!
Individual objects can be stored in a compressed file (.rds
format) with:
saveRDS(a, file="a.rds")
## Load this object again
rm(a)
a <- readRDS(file="a.rds")
But this is not so recommended (unless it took a very long time to create the objects). Rather it is better to relaunch the full analysis and recreate the objects every time.
.R
script on your working directoryYour colleagues, collaborators, and your “future self” will thank you! (https://xkcd.com/1421/)
source("myscript.R")
Ctrl+Enter
or Cmd+Enter
R Markdown provides a framework to combine your code, its results, and your comments. The resulting output can be an HTML page, a PDF document, a Word file, a slideshow, etc.
To create a new R markdown document, go to File -> New File -> R Markdown. Choose a title, an output and save it to a specific folder. This creates a .Rmd
file, including some header (surrounded by ---
), code chunks (surrounded by ```
), and simple text with headers (#
) and formating (e.g., _italics_
).
R code chunks can be run by clicking the “Run” button, or using Ctrl+Enter
or Cmd+Enter
just like R scripts.
To produce a complete output, click the “Knit” button.
q()
quit()
## Save workspace image? [y/n/c]:
R will ask you if you want to save your workspace and history each time you quit R. We recommend to answer n
(for “no”) – given that you have saved the important objects in .rds
format or better saved a script with all commands necessary to regenerate the results (see below).
.R
or .Rmd
script to write all the code for the exercise and save it to a specific directory you created for the course.setwd(...)
to the directory you created for the course.Create two objects a
and b
and attribute them a numeric value (you choose).
a <- 10
b <- 5
## You can of course also use the = sign
a = 10
b = 5
Calculate the value of y
for x = 1
using the equation y = a*x + b
. Repeat for x = -1
x <- 1
y <- a*x + b
y
## [1] 15
x <- -1
y <- a*x + b # This step is needed again, y is not stored as a formula
y
## [1] -5
How to convert 582 seconds to minutes and seconds? Which arithmetic operators could be useful? Check the help page: ?Arithmetic
.
582 %/% 60
## [1] 9
582 %% 60
## [1] 42
## 9 minutes and 42 seconds
Add 2
to a
and multiply everything by 4
to the power of 3
(all in one line)
(a + 2)*4^3
## [1] 768
What are alternative ways of entering the value of a millionth (i.e., one over a million) into R?
1/1000000
## [1] 1e-06
10^-6
## [1] 1e-06
1e-6
## [1] 1e-06
Create 2 objects: rational
equal to 1/3
and decimal
equal to 0.33
. Are these objects equal? Which one is bigger?
rational <- 1/3
decimal <- 0.33
rational == decimal
## [1] FALSE
rational > decimal
## [1] TRUE
Update decimal
to 0.3333333333333333
(sixteen 3s) and redo the test. What is going on?
decimal <- 0.3333333333333333
rational == decimal
## [1] TRUE
## R is storing an approximate number for 1/3, with a limited number of decimals
## Bonus: other ways to compare two number:
identical(rational, decimal)
## [1] TRUE
all.equal(rational, decimal) ## allows testing "near equality"
## [1] TRUE
There is a built-in constant in R for \(\pi\). Can you find how to get it?
Can you round it to 2 decimals? Tip: look at the help page
What is by default the number of decimals used?
help.search("pi")
help.search("round")
?round
round(pi, digits=2)
## [1] 3.14
The help page indicates round(x, digits = 0)
so no decimals is kept by default (i.e., if the function is called like `round(pi)``)
In chemistry, the \(pH\) scale is used as a measure of the acidity of a substance. Substances with a \(pH\) less than 7 are acidic, and substances with a \(pH\) greater than 7 are alkaline. The \(pH\) is a measure of the number of active positive hydrogen ions in the solution, defined by the following formula, where \([H+]\) is the concentration of hydrogen ions in the solution:
\(pH=-\log_{10}([H+])\)
\(-\log_{10}(10^{-4}) = \log_{10}(10^{4}) = 4\)
\(pH_{after} = -\log_{10}(C*2) = -\log_{10}(C) -\log_{10}(2) = pH_{before} - 0.3010\)
\(-\log_{10}(C) = 7.5\)
\(\log(C)/\log(10) = -7.5\)
\(\log(C) = -7.5 * \log(10)\)
\(C = e^{-7.5 * \log(10)} = 3.162278.10^{-08}\)
The concentration of hydrogen ions is thus multiplied by \(e^{-7.5 * \log(10)}/10^{-7} = 0.3162278\)
.R
or .Rmd
script on your working directory to store all your commands and comments. Think of back-ups!setwd
), or create an R project from Rstudio, which will ensure that the working directory is located where the .Rproj
file is located. See https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner%27s-guide/ls()
to check that it is clean.library(...)
sessionInfo()
to let people know what system you are using and what versions of R and packages. Many problems are solved by updating to the the latest versions of the packages.These simple sheets could be useful at the beginning:
4.4 Comments
The
#
character stands for comments. The commented lines are ignored by R.