Data from lab instruments are often consistently untidy, and can therefore be programmatically tidied. This package seeks to tidy lab data much like how broom
tidies statistical objects, with a couple important differences:
- Each data type must be read in, and has its own
read_*
function. This can’t really be avoided: there is really no practical way to provide a single function that can accurately and consistently determine the source of a given file. - Outputs from
tidy_lab
are notdata.frame
s. They are instead objects that contain tidy data. This allows for packages to interact with these downstream objects in unique ways depending on the data type (ie, making generic functions that utilize these objects). If you would prefer a tidydata.frame
(or more specifically,tibble
), runscrub
on the object.
Installation
This package can be downloaded from GitHub with:
# install.packages("devtools")
devtools::install_github("KaiAragaki/mop")
Basic Workflow
Using mop
generally begins with reading in your lab data with it’s respective function. For example, a nanodrop file:
library(mop)
nano <- system.file("extdata", "nanodrop.csv", package = "mop") |>
read_nanodrop(nucleotide = "RNA", date = "2021-08-14")
nano
#> <nanodrop[5]>
#> # A tibble: 18 × 24
#> Date Sample.Name Nucleic.Acid.ng.… A260.A280 A260.A230 A260 A280
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 8/14/2021 8:26… Sample 1 420. 2.03 2.29 10.5 5.16
#> 2 8/14/2021 8:27… Sample 2 450. 2.06 2.26 11.2 5.46
#> 3 8/14/2021 8:28… Sample 3 498. 2.06 2.28 12.4 6.03
#> 4 8/14/2021 8:28… Sample 4 449. 2.05 2.25 11.2 5.48
#> 5 8/14/2021 8:29… Sample 5 474. 2.03 2.29 11.9 5.85
#> 6 8/14/2021 8:29… Sample 6 543. 2.00 2.17 13.6 6.80
#> 7 8/14/2021 8:30… Sample 7 483. 2.07 2.24 12.1 5.84
#> 8 8/14/2021 8:31… Sample 8 588. 2.07 1.97 14.7 7.08
#> 9 8/14/2021 8:31… Sample 9 490. 2.07 2.25 12.2 5.91
#> 10 8/14/2021 8:32… Sample 10 256. 2.03 2.27 6.40 3.15
#> 11 8/14/2021 8:32… Sample 11 225. 2.03 2.27 5.62 2.77
#> 12 8/14/2021 8:33… Sample 12 429. 2.06 2.26 10.7 5.22
#> 13 8/14/2021 8:35… Sample 13 216. 2.02 2.27 5.39 2.66
#> 14 8/14/2021 8:35… Sample 14 218. 2.03 2.29 5.44 2.68
#> 15 8/14/2021 8:36… Sample 15 206. 2.03 2.24 5.15 2.54
#> 16 8/14/2021 8:37… Sample 16 426. 2.07 2.25 10.7 5.15
#> 17 8/14/2021 8:37… Sample 17 389. 2.06 2.28 9.74 4.73
#> 18 8/14/2021 8:38… Sample 18 560. 2.04 2.19 14.0 6.88
#> # … with 17 more variables: Nucleic.Acid.Factor <dbl>,
#> # Baseline.Correction..nm. <int>, Baseline.Absorbance <dbl>,
#> # Corrected..ng.uL. <lgl>, Corrected..CV <lgl>, Impurity.1 <lgl>,
#> # Impurity.1.A260 <lgl>, Impurity.1..CV <lgl>, Impurity.1.mM <lgl>,
#> # Impurity.2 <lgl>, Impurity.2.A260 <lgl>, Impurity.2..CV <lgl>,
#> # Impurity.2.mM <lgl>, Impurity.3 <lgl>, Impurity.3.A260 <lgl>,
#> # Impurity.3..CV <lgl>, Impurity.3.mM <lgl>
#> # Nucelotide: RNA
#> # Is tidy: FALSE
#> # Date: 2021-08-14
Somethings to note:
-
nano
is currently NOT tidy -
nano
is ananodrop
object
Additionally, read_nanodrop
will normally try to extract nucleotide and date information from the file name, but were supplied manually here as the file is named to be understandable.
To tidy any lab object, pass it to tidy_lab
:
nano_tidy <- tidy_lab(nano)
nano_tidy
#> <nanodrop[5]>
#> # A tibble: 18 × 24
#> date sample_name conc a260_280 a260_230 a260 a280 nucleic_acid_fac…
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2021-08-14… Sample 1 420. 2.03 2.29 10.5 5.16 40
#> 2 2021-08-14… Sample 2 450. 2.06 2.26 11.2 5.46 40
#> 3 2021-08-14… Sample 3 498. 2.06 2.28 12.4 6.03 40
#> 4 2021-08-14… Sample 4 449. 2.05 2.25 11.2 5.48 40
#> 5 2021-08-14… Sample 5 474. 2.03 2.29 11.9 5.85 40
#> 6 2021-08-14… Sample 6 543. 2.00 2.17 13.6 6.80 40
#> 7 2021-08-14… Sample 7 483. 2.07 2.24 12.1 5.84 40
#> 8 2021-08-14… Sample 8 588. 2.07 1.97 14.7 7.08 40
#> 9 2021-08-14… Sample 9 490. 2.07 2.25 12.2 5.91 40
#> 10 2021-08-14… Sample 10 256. 2.03 2.27 6.40 3.15 40
#> 11 2021-08-14… Sample 11 225. 2.03 2.27 5.62 2.77 40
#> 12 2021-08-14… Sample 12 429. 2.06 2.26 10.7 5.22 40
#> 13 2021-08-14… Sample 13 216. 2.02 2.27 5.39 2.66 40
#> 14 2021-08-14… Sample 14 218. 2.03 2.29 5.44 2.68 40
#> 15 2021-08-14… Sample 15 206. 2.03 2.24 5.15 2.54 40
#> 16 2021-08-14… Sample 16 426. 2.07 2.25 10.7 5.15 40
#> 17 2021-08-14… Sample 17 389. 2.06 2.28 9.74 4.73 40
#> 18 2021-08-14… Sample 18 560. 2.04 2.19 14.0 6.88 40
#> # … with 16 more variables: baseline_correction_nm <int>,
#> # baseline_absorbance <dbl>, corrected_ngul <lgl>, corrected_cv <lgl>,
#> # impurity_1 <lgl>, impurity_1_a260 <lgl>, impurity_1_cv <lgl>,
#> # impurity_1_m_m <lgl>, impurity_2 <lgl>, impurity_2_a260 <lgl>,
#> # impurity_2_cv <lgl>, impurity_2_m_m <lgl>, impurity_3 <lgl>,
#> # impurity_3_a260 <lgl>, impurity_3_cv <lgl>, impurity_3_m_m <lgl>
#> # Nucelotide: RNA
#> # Is tidy: TRUE
#> # Date: 2021-08-14
Of note, our raw data is stored in nano$raw_data
.
The operations that make an object tidy vary per object class, and can be found in the object’s documentation (here ?tidy_lab.nanodrop
).
Objects are useful as they form a semi-stable language for other functions from this and other packages to operate on. However, it’s often much simpler to interact with data in a flat tibble
. This can be done using scrub
:
nano_scrub <- scrub(nano_tidy)
nano_scrub[c(1:3, (ncol(nano_scrub)-2):ncol(nano_scrub))]
#> # A tibble: 18 × 6
#> date sample_name conc exp_date nucleotide is_tidy
#> <chr> <chr> <dbl> <date> <chr> <lgl>
#> 1 2021-08-14 20:26:49 Sample 1 420. 2021-08-14 RNA TRUE
#> 2 2021-08-14 20:27:25 Sample 2 450. 2021-08-14 RNA TRUE
#> 3 2021-08-14 20:28:07 Sample 3 498. 2021-08-14 RNA TRUE
#> 4 2021-08-14 20:28:40 Sample 4 449. 2021-08-14 RNA TRUE
#> 5 2021-08-14 20:29:17 Sample 5 474. 2021-08-14 RNA TRUE
#> 6 2021-08-14 20:29:54 Sample 6 543. 2021-08-14 RNA TRUE
#> 7 2021-08-14 20:30:30 Sample 7 483. 2021-08-14 RNA TRUE
#> 8 2021-08-14 20:31:16 Sample 8 588. 2021-08-14 RNA TRUE
#> 9 2021-08-14 20:31:50 Sample 9 490. 2021-08-14 RNA TRUE
#> 10 2021-08-14 20:32:24 Sample 10 256. 2021-08-14 RNA TRUE
#> 11 2021-08-14 20:32:59 Sample 11 225. 2021-08-14 RNA TRUE
#> 12 2021-08-14 20:33:50 Sample 12 429. 2021-08-14 RNA TRUE
#> 13 2021-08-14 20:35:16 Sample 13 216. 2021-08-14 RNA TRUE
#> 14 2021-08-14 20:35:59 Sample 14 218. 2021-08-14 RNA TRUE
#> 15 2021-08-14 20:36:34 Sample 15 206. 2021-08-14 RNA TRUE
#> 16 2021-08-14 20:37:12 Sample 16 426. 2021-08-14 RNA TRUE
#> 17 2021-08-14 20:37:50 Sample 17 389. 2021-08-14 RNA TRUE
#> 18 2021-08-14 20:38:21 Sample 18 560. 2021-08-14 RNA TRUE
Note how meta-data fields have now become individual columns.