Using gplate to wrangle plate data • gplate

library(gplate)

Introduction

If you work in a lab or with lab data, you’ve probably had to deal with data that look like this:

protein_quant
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
#> [1,] 0.0691 0.0801 0.0978 0.1212 0.1731 0.2395 0.3812 0.2402 0.2593 0.2525
#> [2,] 0.0693 0.0810 0.0966 0.1247 0.1732 0.2454 0.3988 0.2527 0.2636 0.2636
#> [3,] 0.0711 0.0827 0.1011 0.1256 0.1855 0.2466 0.3967 0.2515 0.2580 0.2602
#> [4,] 0.2735 0.2725 0.2583 0.2708 0.2693 0.2749 0.2610 0.0739 0.0718 0.0715
#> [5,] 0.2501 0.2634 0.2559 0.2630 0.2650 0.2629 0.2548 0.0696 0.0667 0.0646
#> [6,] 0.2549 0.2699 0.2513 0.2578 0.2588 0.2624 0.2463 0.0726 0.0727 0.0725
#> [7,] 0.0799 0.0951 0.0805 0.0796 0.0768 0.0792 0.0774 0.0762 0.0766 0.0767
#> [8,] 0.0456 0.0456 0.0505 0.0469 0.0469 0.0476 0.0474 0.0457 0.0456 0.0474
#>       [,11]  [,12]
#> [1,] 0.2371 0.2572
#> [2,] 0.2419 0.2616
#> [3,] 0.2422 0.2608
#> [4,] 0.0682 0.0651
#> [5,] 0.0621 0.0622
#> [6,] 0.0710 0.0708
#> [7,] 0.0760 0.0784
#> [8,] 0.0467 0.0457

Just looking at this sucks. You can use some tidyverse magic or use plater to help you at least make this matrix into a tidy data.frame, but annotating the wells as to which rows and which columns are what can be soul-rending (heavens forbid you have a non-standard plate layout). In this vignette, we’ll not only tidy these data, but annotate them in a fairly painless process.

The data

The data shown above represent absorbance values at 562nm. gplate lets us plot our data quickly:

gp(8, 12, protein_quant) |>
  gp_plot(value)

Allow me to describe what you’re seeing. This has the added benefit that by describing it, we are also tidying it - more on that later.

Each sample is in triplicate, and each triplicate stands next to one another moving from left to right, wrapping around to the next ‘band’ of rows when it hits an edge. Or, more simply:

gp(8, 12) |> 
  gp_sec("samples", nrow = 3, ncol = 1) |> 
  gp_plot(samples) +
  ggplot2::theme(legend.position = "none")

However, there are some wells that have sample in them, and some that are empty. I want to specify the difference between the two:

gp(8, 12) |> 
  gp_sec("has_sample", nrow = 3, ncol = 19, wrap = TRUE, labels = c("sample")) |> 
  gp_plot(has_sample)

Notice the wrap = TRUE - this allows for sections that are bigger than the ‘parent section’ (here the plate) by wrapping them around to the next ‘band’.

Now say I want to label each one as a number of a triplicate - the top sample is 1, the middle is 2, and the bottom is 3. In the above sentence, I mentioned ‘parent section’ because any section can also have sections of it’s own. We’re going to use this idea to label our replicates:

gp(8, 12) |> 
  gp_sec("has_sample", nrow = 3, ncol = 19, wrap = TRUE, labels = c("sample")) |> 
  gp_sec("replicate", nrow = 1) |> 
  gp_plot(replicate)

Notice here how I didn’t specify ncol. This is because by default, a section will take up the maximum space possible (here 19).

Some of these samples make up a standard curve, while others make up ‘unknowns’. I’m going to label which is which:

gp(8, 12) |> 
  gp_sec("has_sample", nrow = 3, ncol = 19, wrap = TRUE, labels = c("sample")) |> 
  gp_sec("replicate", nrow = 1, advance = F) |>
  gp_sec("type", nrow = 3, ncol = c(7, 12), labels = c("standard", "sample")) |>
  gp_plot(type)

Note the addition of the argument advance = F in the previous section. This ensures that the next section - type - will be a sibling of replicate, rather than its child. That is, we continue to annotate relative to has_sample rather than annotating relative to replicate.

Finally, I’m going to give an index for each sample:

gp(8, 12) |> 
  gp_sec("has_sample", nrow = 3, ncol = 19, wrap = TRUE, labels = c("sample")) |> 
  gp_sec("replicate", nrow = 1, advance = F) |>
  gp_sec("type", nrow = 3, ncol = c(7, 12), labels = c("standard", "sample")) |>
  gp_sec("sample", ncol = 1) |> 
  gp_plot(sample) +
  ggplot2::theme(legend.position = "none") # Too many samples - clutters the plot

Now, the fun part: Since we described our data so well, tidying it is very easy. First, we supply our data as the third argument of gp:

my_plate <- gp(8, 12, protein_quant) |> 
  gp_sec("has_sample", nrow = 3, ncol = 19, wrap = TRUE, labels = c("sample")) |> 
  gp_sec("replicate", nrow = 1, advance = F) |>
  gp_sec("type", nrow = 3, ncol = c(7, 12), labels = c("standard", "sample")) |>
  gp_sec("sample", ncol = 1)

And now we use gp_serve:

gp_serve(my_plate) |> 
  dplyr::arrange(.row, .col) |> 
  head(20) |> 
  knitr::kable()

.row	.col	value	has_sample	replicate	type	sample
1	1	0.0691	sample	1	standard	1
1	2	0.0801	sample	1	standard	2
1	3	0.0978	sample	1	standard	3
1	4	0.1212	sample	1	standard	4
1	5	0.1731	sample	1	standard	5
1	6	0.2395	sample	1	standard	6
1	7	0.3812	sample	1	standard	7
1	8	0.2402	sample	1	sample	1
1	9	0.2593	sample	1	sample	2
1	10	0.2525	sample	1	sample	3
1	11	0.2371	sample	1	sample	4
1	12	0.2572	sample	1	sample	5
2	1	0.0693	sample	2	standard	1
2	2	0.0810	sample	2	standard	2
2	3	0.0966	sample	2	standard	3
2	4	0.1247	sample	2	standard	4
2	5	0.1732	sample	2	standard	5
2	6	0.2454	sample	2	standard	6
2	7	0.3988	sample	2	standard	7
2	8	0.2527	sample	2	sample	1

I don’t know about you, but I think that’s pretty cool.