Data Visualization

Before starting this workshop we recommend you to download the following two files : 

Data visualisation in Rstudio

Before starting

As covered by the introduction, there are several things that you need to do for all the practicals to work: + Install R + Install RStudio + Install all the necessary packages

Introduction

RStudio is a must-know tool for everyone who works with the R programming language. It’s used in data analysis to import, access, transform, explore, plot, and model data, and even for machine learning to make predictions on data. RStudio is a flexible and multifunctional open-source IDE (integrated development environment) that is extensively used as a graphical front-end to work with the R programming language, giving it a user-friendly interface that makes programming more accessible.

If you’re just getting started with learning R, RStudio is definitely for you. So let’s get started with some basic functionalities of RStudio.

Creating a dataframe

What is a data frame? A data frame is the most common way of storing data in R and, generally, is the data structure most often used for data analyses. It works very similar to a table, however aditionally to storing your data you can also make operations on it and visualize it.

In our case, we are going to have information on bird observations in a specific region in the Netherlands, including the name of the species that were spotted and the amount of times they were seen.

The two variables we just defined are vectors. Vectors are lists of values, and they are the simplest data structure in R.

Note that it’s necessary to place quotes around text (for the values under the “Name” column), but it’s not required to use quotes around numeric values (for the values under the “Observations” column).

df <- data.frame(Name, Observations)
print(df)
##          Name Observations
## 1  C. ciconia           12
## 2  P. porzana            4
## 3   C. pugnax           16
## 4  L. marinus            6
## 5 G. stellata           10

You can also create data frames by importing data into R (Suggestion: save your Excel files as .csv before doing this)

df_from_excel <- read.csv("data_vis.csv", sep = ";")

Operations with our dataframe

Now that the data is loaded, we can proceed to do operations with it, or maybe even plot it, it all depends on what you aim to obtain with this analysis

For numerical operations, there are already several built-in functions in R. Let’s try to use some.

We could check how many observations there were in total, without discriminating between species:

total_Obs <- sum(df$Observations)
print(total_Obs)
## [1] 48

And now that we have a total, we could even make our observations a percentage instead of the count values we already have, and add that information to our table as a new column

df$Percentage <- df$Observations * 100 / total_Obs

After having done the operation, we can check our dataframe and see if it was actually added:

print(df)
##          Name Observations Percentage
## 1  C. ciconia           12  25.000000
## 2  P. porzana            4   8.333333
## 3   C. pugnax           16  33.333333
## 4  L. marinus            6  12.500000
## 5 G. stellata           10  20.833333

Question

If you wanted to check that the Percentage column sums up to 100% (which would indicate the calculation was done correctly), how would you do it?

Making plots with our data

Many times, we prefer to summarize and present our findings in a visual way, be it for presentations at conferences or for publishing an article. Luckily, we can also do this in an easy way with RStudio.

Let’s try to plot the data we are working with now. For the bird observations, perhaps a bar plot would be a good way to start:

plot_bars <- barplot(df$Observations, names.arg = df$Name, main = "Bar plot of observed birds")

To summarize our results even further, we could even make a pie chart:

pie(df$Observations, labels = df$Name, main = "Pie chart of observed birds")

But here we are merely scratching the surface. So keep learning and trying new things with RStudio!

Example: Boxplot (https://r-graph-gallery.com/boxplot.html)

# Libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(viridis)
## Loading required package: viridisLite
# create a dataset
data <- data.frame(
  name=c( rep("A",500), rep("B",500), rep("B",500), rep("C",20), rep('D', 100)  ),
  value=c( rnorm(500, 10, 5), rnorm(500, 13, 1), rnorm(500, 18, 1), rnorm(20, 25, 4), rnorm(100, 12, 1) )
)

# Boxplot basic
data %>%
  ggplot( aes(x=name, y=value, fill=name)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Basic boxplot") +
    xlab("")

We can also perform some basic operations on numbers and safe them directly in variables :