--- title: "Controlling y-axis Plotting" author: "Tom Kelly" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{vioplot: Controlling y-axis Plotting} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- While boxplots have become the _de facto_ standard for plotting the distribution of data this is a vast oversimplification and may not show everything needed to evaluate the variation of data. This is particularly important for datasets which do not form a Gaussian "Normal" distribution that most researchers have become accustomed to. While density plots are helpful in this regard, they can be less aesthetically pleasing than boxplots and harder to interpret for those familiar with boxplots. Often the only ways to compare multiple data types with density use slices of the data with faceting the plotting panes or overlaying density curves with colours and a legend. This approach is jarring for new users and leads to cluttered plots difficult to present to a wider audience. ##Violin Plots Therefore violin plots are a powerful tool to assist researchers to visualise data, particularly in the quality checking and exploratory parts of an analysis. Violin plots have many benefits: - Greater flexibility for plotting variation than boxplots - More familiarity to boxplot users than density plots - Easier to directly compare data types than existing plots As shown below for the `iris` dataset, violin plots show distribution information that the boxplot is unable to. ```{r} library("vioplot") ``` ```{r, message=FALSE} data(iris) boxplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica")) vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica")) ``` ##Violin y-axis ###Logarithmic scale However the existing violin plot packages (such as \code{\link[vioplot]{vioplot}}) do not support log-scale of the y-axis. This has been amended with the `ylog` argument. ```{r} vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", ylog = T, ylim=c(log(1), log(10))) ``` This can also be invoked with the `log="y"` argument compatible with `boxplot`: ```{r} vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", log = T, ylim=c(log(1), log(10))) vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", log = "y", ylim=c(log(1), log(10))) ``` ###custom y-axes The y-axes can also be removed with `yaxt="n"` to enable customised y-axes: ```{r} vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", yaxt="n") vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", ylog = T, yaxt="n", ylim=c(log(1), log(10))) ``` Thus custom axes can be added to violin plots. As shown on a linear scale: ```{r} vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", yaxt="n") axis(2, at=1:10, labels=1:10) ``` As well as for on a log scale: ```{r} vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", yaxt="n", log="y", ylim=c(log(4), log(9))) axis(2, at=log(1:10), labels=1:10) ```