library(islay)
data(islay_lithics)
Visualising relationships
In this tutorial we will continue to learn about visualisation as a tool for exploratory data analysis. We will look at ways of visualising the relationship between two or more variables using bar and column plots, scatterplots, additional aesthetics and facets.
Objectives
By the end of this tutorial you should:
- Be able to generate questions about the relationship between two or more variables
- Know how to produce bar plots, multiple density plots, stacked bar plots, and scatter plots in R using ggplot2
- Be able to refine plots for communication and export them from R
Prerequisites
- Edward Tufte, The Visual Display Of Quantitative Information (2nd edition), pp. 91–138:
- Chapter 4, “Data–Ink and Graphical Redesign”
- Chapter 5, “Chartjunk: Vibrations, Grids, and Ducks”
- Chapter 6, “Data–Ink Maximization and Graphical Design”
Generating questions about relationships
Last week we looked at using visualisation to answer questions about the variation of a variable (its distribution). Although essential for describing and understanding the nature of your dataset, questions about a single variable have a fundamentally limited explanatory value.
This week we will start looking at the covariation between two (or more) variables – in plain terms, the relationship between them. With this we can start to gain insights into causality. In statistics, we say that there is a correlation between two variables if one can measurably predict the other. This is not a statement about causality, merely practicality: if you knew two variables were correlated, you could make a good guess about the value of the other.
This leads to the well-known adage, “correlation is not causation”. But equally, we should be aware that correlation can be a good hint about causation!
Exercises
Given a dataset on a burial ground, with the following variables:
- sex of the individual
- age of the individual
- age of the burial (i.e. a radiocarbon date)
- number of grave goods
- number of metal objects amongst the grave goods
- What questions of covariation could we ask of the dataset?
- If there was a correlation between the age of the individual and the number of grave goods, could that imply causation?
- What about a correlation between the number of grave goods and the number of metal objects?
Visualising relationships
Work through section 2.5 and 2.6 of *R for Data Science” (2nd ed.)
You will then apply these techniques to an archaeological dataset.
Lithic assemblages from Islay
Load the islay_lithics
dataset from islay:
We can use the head()
function to get a quick preview of the data frame:
head(islay_lithics)
site_code region period area flakes blades
1 LGM1 Loch Gorm South Mesolithic & Later Prehistoric 102450 159 15
2 LGM2 Loch Gorm South Mesolithic & Later Prehistoric 62497 125 6
3 LMG4 Loch Gorm South <NA> 37480 12 0
4 LGM5 Loch Gorm South Mesolithic 52473 128 18
5 LGM6 Loch Gorm South Later Prehistoric 54971 56 4
6 LGM8 Loch Gorm South <NA> 49974 29 1
chunks cores pebbles retouched total
1 16 24 0 15 229
2 11 20 4 16 182
3 1 1 6 3 23
4 17 27 7 5 202
5 8 18 12 10 108
6 20 3 0 5 58
Because this is an in-built dataset of the package, you can also enter ?islay_lithics
to open the help page for the dataset, which contains more information on what it describes.
As with the last dataset, it will be useful to turn the period
column into a factor now, so that it will automatically be ordered in our subsequent plots:
<- c("Mesolithic", "Mesolithic & Later Prehistoric", "Later Prehistoric")
periods $period <- factor(islay_lithics$period, periods) islay_lithics
Exercises
- Generate a plot showing the relationship between period and the number of retouched pieces. Is there a correlation? What could explain this?
- Try with two other types of lithics. Does it change your answer?
- Generate a plot showing the relationship between the number of two types of lithics.
- Add an aesthetic showing a categorical variable.
- Export the plot.